Transcript Document

CSC550N
Natural Language
Understanding
Sean Kelly
7/17/2015
1
Introduction





Natural Language Understanding (NLU) is a
subset of Natural Language Processing (NLP)
NLP has always been popular in science fiction
NLP is composed of many smaller areas of
research overlapping and combining into a
bigger concept
Many real world implementations of NLP
Many obstacles hinder NLU progress
7/17/2015
2
Popular in Science Fiction

HAL in 2001: A Space Odyssey


The Computer in Star Trek


Acted as a psychotherapist, entertainer, and friend for
the astronauts (until he killed them)
Accepted voice commands from crewmembers while
understanding context
The Holographic Doctor in Star Trek

7/17/2015
A natural language processor attached to a physical
image. Was able to understand spoken words equally
as well as people.
3
Areas of Study in NLP









Natural language understanding
Text to speech
Speech recognition
Natural language generation
Machine translation
Question answering
Information retrieval
Information extraction
Text-proofing
7/17/2015
4
Narrowing the Scope
Because NLP is such a large field, this
presentation will focus only on how
computers can understand sentences, or
NLU.
For example, what does this mean?
The cars went fast.
7/17/2015
5
HOW IT WORKS
Syntax and Semantics
7/17/2015
6
How to Understand



We have already seen a computer program that
can take sentences as input and provide a
response as output. This program was called
Eliza.
Eliza did not understand. While it implemented
simplistic syntax, it lacked semantics.
Eliza had no concept of meaning. It relied on
sentence structure and rules to generate output.
7/17/2015
7
Syntax and Semantics

In order to understand, software must first be
able to parse sentence structures (syntax), and
the meaning of the words in the sentence
(semantics).

Syntax: the grammatical arrangement of words
in sentences

Semantics: the study of language meaning
7/17/2015
8
Syntax



Use the sentence structure to your advantage
By breaking down a sentence into sentence
components (nouns, verbs, subject, etc), you
can infer many things:
Example: Use the subject to determine the
actor in these sentences:
Mike depended on Damien.
Damien sent a bill to Mike.
7/17/2015
9
Breaking It Down

By breaking sentences down
into components, they can be
expressed in trees, graphs, or
other computer friendly
formats.
Sentence
Noun phrase
Verb phrase
Donner paid Damien.
Proper Noun
Verb
could become:
Donner
(sentence
(nounphrase (propernoun ‘Donner))
(verbphrase (verb ‘paid))
(nounphrase (propernoun
‘Damien)))
7/17/2015
paid
Proper noun
Damien
10
Structural Ambiguity

Structural ambiguity arises when a sentence can
be parsed into different trees
Up and down or left and right.
could be:
(Up and down) or (left and right)
Up and (down or (left and right))
Up and ((down or left) and right)
…
7/17/2015
11
Semantics




More complicated than syntax.
Must understand the meaning of each word, and
also how each word interacts with the other
words in the sentence.
When understanding many sentences, then
semantics must also infer context from other
sentences.
Example: How many meanings can you find for
this sentence?
Mike got it.
7/17/2015
12
Search Space Explosion





Semantic parsing requires determining the
meaning of each word in a sentence
If each word in five word sentence has four
possible meanings, the search space is 54, or
625
If each word in the sentence has six meanings,
the search space is even larger. 56 or 15,625.
Search space grows quickly on long sentences
with multiple words with several meanings.
Most English sentences are of this form.
7/17/2015
13
Global and Local Ambiguity


Local ambiguity arises when part of a sentence
could have muiltiple meanings
Global ambiguity is caused by an entire
sentence having multiple meanings. This must
be solved using context and other means.
Local ambiguity
I know more about computers than
Donner, although he knows quite a
lot.
Global ambiguity
I saw a fire driving to class.
7/17/2015
14
Referential Ambiguity
Referential ambiguity arises when a noun
in a sentence can refer to many things
 This is very common in language.

They are coming.
Who are ‘they’?
7/17/2015
15
Incomplete Sentences


Incomplete sentences cause problems because
the writer assumes that the reader is able to
derive the meaning of a sentence
Computers are bad at doing this, due to their
limited semantic knowledge
Mike is packing up and flying home. So is
she.
Is she just packing up, just flying home, or doing
both?
7/17/2015
16
The Semantic Problem
After parsed into a tree or other “computer
friendly” form, it is still necessary to
understand meaning.
 Understanding context is very difficult,
especially in general purpose systems
(such as the Star Trek computer).
 Most systems use a combination of
techniques to understand context.

7/17/2015
17
The Blinkered Approach




Assume there are no context ambiguities.
Sentences are only parsed for pre-programmed
context, such as database-related queries in a
database system or medical questions in a
medical system.
This approach is best suited for special purpose
applications.
Example: If user asks for information on broken
legs, assume they mean human legs and not
table legs.
7/17/2015
18
The Co-operative Approach





When contextual ambiguity arises, ask for
clarification
Good for interactive systems
Very similar to how people act when they don’t
understand
This approach could be ineffective if it is too
interactive.
Example: Software that answers questions from
a user about finding geographic locations. “Do
you mean Washington state or Washington
D.C.?”
7/17/2015
19
The Common Sense Approach



This approach involves giving the software an
encyclopedia of knowledge to make decisions
from
Software uses “common sense” derived from
this bank of information to derive semantic
meaning and solve semantic conflicts.
While good in theory, the search space of a
decent information repository makes this
approach slow and tedious, if not impossible.
7/17/2015
20
PROJECTS AND
MILESTONES
Actual implementations
7/17/2015
21
SHRDLU
Designed by Terry Winograd at MIT
 Implemented a small “block world” in
MacLISP in which users could ask the
computer to do things with the basic
objects
 Basic objects in the world included
pyramids, blocks, balls, cones, …

7/17/2015
22
Example SHRDLU Session
Person: PICK UP A BIG RED BLOCK.
Computer: OK.
Person: GRASP THE PYRAMID.
Computer: I DON'T UNDERSTAND WHICH PYRAMID
YOU MEAN.
Person: FIND A BLOCK WHICH IS TALLER THAN THE
ONE YOU ARE HOLDING AND PUT IT INTO THE BOX.
Computer: BY "IT", I ASSUME YOU MEAN THE BLOCK
WHICH IS TALLER THAN THE ONE I AM HOLDING.
Computer: OK.
7/17/2015
23
SHRDLU Screenshot
7/17/2015
24
7/17/2015
25
The CYC Project
Began in 1984
 Ran for 12 years
 Goal was to teach as much real world
information as possible to a computer,
thus enhancing semantic processing
 “Knowledge engineers” employed to sift
through encyclopedias and other sources
and program relations and meanings

7/17/2015
26
The CYC Project

Problems with this system





Huge search space
Slow semantic processing
Some believe that humans do not process logic in the
same way as CYC
No “common sense” system has been derived
directly from CYC to date
CYCorp continued the research to develop
areas such as distributed AI, intelligent
searching, and “smart” database integration.
7/17/2015
27
CONTEX




A system that can be used to translate limited
English sentences to other languages
Been tested on select Wall Street Journal
excerpts with a translation quality of around 72%
Uses both syntactical and semantic background
knowledge
Learns from examples
7/17/2015
28
MSR-MT





Microsoft Research’s Machine Translation
system is the result in over a decade of research
Data-driven system, self-customizing, system
that learns from previously translated materials
A document in two languages is fed to MSR-MT
for training
Mapping function correlates the two languages
based on previous data
Correlations are stored in a repository called
“MindNet”
7/17/2015
29
MSR-MT
Has been used to translate Microsoft’s
Knowledge Base to Spanish
 System can be trained off hundreds of
thousands of Microsoft texts in a single
night

7/17/2015
30
MSR-MT (English document)
7/17/2015
31
MSR-MT (Spanish translation)
7/17/2015
32
7/17/2015
Picture from http://research.microsoft.com/nlp/Projects/MTproj.aspx
33
In Conclusion
NLP is a big field of research
 In order for a machine to understand, it
needs to process both syntax and
semantics
 The abstract nature of human language is
an obstacle in the NLP field, with no
solution in the near future.
 Despite current limitations, NLP can be
used in many positive and powerful ways.

7/17/2015
34
Special thanks to Mike
Donner for being the
subject of every
single example in this
presentation without
knowing it.
7/17/2015
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see thi s pi cture.
35
Sources and Related Material

http://www.scism.sbu.ac.uk/inmandw/tutorials/nlp/
http://www2.cs.cmu.edu/Groups/AI/html/faqs/ai/nlp/nlp_faq/faq.html
http://www.ai.sri.com/natural-language/naturallanguage.html
http://www.aaai.org/AITopics/html/natlang.html
http://en.wikipedia.org/wiki/Natural_language_understan
ding
http://en.wikipedia.org/wiki/SHRDLU

http://www.isi.edu/~ulf/diss_abstract.html





7/17/2015
36
Sources and Related Material

http://research.microsoft.com/nlp/Projects/
MTproj.aspx
7/17/2015
37