A Risk Minimization Framework for Information Retrieval

Download Report

Transcript A Risk Minimization Framework for Information Retrieval

Introduction to
Natural Language Processing
ChengXiang Zhai
Department of Computer Science
University of Illinois, Urbana-Champaign
1
Lecture Plan
• What is NLP?
• A brief history of NLP
• The current state of the art
• NLP and text information systems
2
What is NLP?
Thai:
…เรา เล่ น ฟุตบอล …
How can a computer make sense out of this string
?
- What are the basic units of meaning (words)?
Morphology
- What is the meaning of each word?
Syntax - How are words related with each other?
Semantics - What is the “combined meaning” of words?
Pragmatics - What is the “meta-meaning”? (speech act)
Discourse - Handling a large chunk of text
Inference - Making sense of everything
3
An Example of NLP
A dog is chasing a boy on the playground
Det
Noun Aux
Noun Phrase
Complex Verb
Semantic analysis
Dog(d1).
Boy(b1).
Playground(p1).
Chasing(d1,b1,p1).
+
Scared(x) if Chasing(_,x,_).
Scared(b1)
Inference
Verb
Det Noun Prep Det
Noun Phrase
Noun
Noun Phrase
Lexical
analysis
(part-of-speech
tagging)
Prep Phrase
Verb Phrase
Syntactic analysis
(Parsing)
Verb Phrase
Sentence
A person saying this may
be reminding another person to
get the dog back…
Pragmatic analysis
(speech act)
4
If we can do this for all the
sentences, then …
BAD NEWS:
Unfortunately, we can’t.
General NLP = “AI-Complete”
5
NLP is Difficult!!
• Natural language is designed to make human
communication efficient. As a result,
– we omit a lot of “common sense” knowledge, which
we assume the hearer/reader possesses
– we keep a lot of ambiguities, which we assume the
hearer/reader knows how to resolve
• This makes EVERY step in NLP hard
– Ambiguity is a “killer”!
– Common sense reasoning is pre-required
6
Examples of Challenges
• Word-level ambiguity: E.g.,
– “design” can be a noun or a verb (Ambiguous POS)
– “root” has multiple meanings (Ambiguous sense)
• Syntactic ambiguity: E.g.,
– “natural language processing” (Modification)
– “A man saw a boy with a telescope.” (PP Attachment)
• Anaphora resolution: “John persuaded Bill to buy a
TV for himself.” (himself = John or Bill?)
• Presupposition: “He has quit smoking.” implies that he
smoked before.
7
Despite all the challenges,
research in NLP has also made
a lot of progress…
8
High-level History of NLP
•
Early enthusiasm (1950’s): Machine Translation
–
–
•
•
Less ambitious applications (late 1960’s & early 1970’s): Limited success, failed to
scale up
–
–
–
Speech recognition
Deep understanding in
Dialogue (Eliza)
Shallow understanding
Inference and domain knowledge (SHRDLU=“block world”) limited domain
Real world evaluation (late 1970’s – now)
–
–
–
•
Too ambitious
Bar-Hillel report (1960) concluded that fully-automatic high-quality translation could not be
accomplished without knowledge (Dictionary + Encyclopedia)
Story understanding (late 1970’s & early 1980’s) Knowledge representation
Large scale evaluation of speech recognition, text retrieval, information extraction (1980 –
now)
Robust component techniques
Statistical approaches enjoy more success (first in speech recognition & retrieval, later
others)
Stat. language models
Current trend:
–
–
–
–
Learning-based NLP
Heavy use of machine learning techniques
Boundary between statistical and symbolic approaches is disappearing.
Applications
We need to use all the available knowledge
Application-driven NLP research (bioinformatics, Web, Question answering…)
9
The State of the Art
A dog is chasing a boy on the playground
Det
Noun Aux
Noun Phrase
Verb
Complex Verb
Det Noun Prep Det
Noun Phrase
Noun
POS
Tagging:
97%
Noun Phrase
Prep Phrase
Verb Phrase
Parsing: partial >90%(?)
Semantics: some aspects
Verb Phrase
- Entity/relation extraction
- Word sense disambiguation
- Anaphora resolution
Sentence
Speech act analysis: ???
Inference: ???
10
Technique Showcase: POS Tagging
Training data (Annotated text)
This sentence
Det
N
annotated
text…
V2
N
“This is a new sentence”
Consider all possibilities,
and pick the one with
the highest probability
serves as an example of
V1
P Det
N
P
This is
a new sentence
Det Aux Det Adj
N
POS Tagger
new
sentence Method 1: Independent assignment
This is
a
Det Det
……
Det Aux
……
V2 V2
Det Det
Det
Det Adj
N
V2
V2
V2
Most common tag
p ( w1 ,..., wk , t1 ,..., tk )
 p (t1 | w1 )... p (tk | wk ) p ( w1 )... p ( wk )

 k
 p ( wi | ti ) p (ti | ti 1 )
 i 1
Method 2: Partial dependency
w1=“this”, w2=“is”, …. t1=Det, t2=Det, …,
11
Technique Showcase: Parsing
Grammar
Lexicon
1.0
S NP VP
0.3
NP  Det BNP
0.4
NP  BNP
0.3
NP NP PP
BNP N
…
VP  V
VP  Aux V NP
…
VP  VP PP
PP  P NP 1.0
Generate
V  chasing 0.01
Aux is
N  dog 0.003
N  boy
N playground …
Det the
…
Det a
P  on
S
Probability of this tree=0.000015
NP
VP
Det
BNP
A
N
VP
Aux
dog
PP
V
is chasing
NP
P
NP
on
a boy
the playground
S
NP
Det
A
VP
BNP
N
Aux
is
Choose a tree with
highest prob….
NP
V
PP
chasing NP
P
dog
Can also be treated as a classification/decision problem…
a boy
NP
on
the playground
roller skates
12
Semantic Analysis Techniques
• Only successful for VERY limited domain or for
SOME aspect of semantics
• E.g.,
– Entity extraction (e.g., recognizing a person’s name):
Use rules and/or machine learning
– Word sense disambiguation: addressed as a
classification problem with supervised learning
– Sentiment tagging
– Anaphora resolution …
In general, exploiting machine learning
and statistical language models…
13
What We Can’t Do
• 100% POS tagging
– “He turned off the highway.” vs “He turned off the fan.”
• General complete parsing
– “A man saw a boy with a telescope.”
• Precise deep semantic analysis
– Will we ever be able to precisely define the meaning of “own” in
“John owns a restaurant.”?
Robust & general NLP tends to be “shallow”,
while “deep” understanding doesn’t scale up
14
Major NLP Applications
•
•
Speech recognition: e.g., Auto telephone call routing
Text management and analysis
–
–
–
–
•
•
•
Text retrieval/filtering
Text classification
Text summarization
Text mining
Our focus
– Query answering
Language tutoring
– Spelling/grammar correction
Machine translation
– Cross-language retrieval
– Restricted natural language
Natural language user interface
15
NLP & Text Information Systems
Better NLP => Better Text Information Systems
Bad NLP => Bad Text Information Systems?
Robust shallow NLP tends to be more useful than fragile
deep NLP.
Errors in NLP can hurt a text information system
16
How Much NLP is Really Needed?
Tasks
Dependency on NLP
“Easier” &
More “workarounds”
Classification/
Retrieval
Summarization/
Extraction/
Topic Mining
Translation/
Dialogue
Question
Answering
17
Workaround Example I: ELIZA
ELIZA plays the role of a therapist
Person : Men are all alike.
ELIZA : In what way?
Person : They're always bugging us about something or other.
ELIZA : Can you think of a specific example?
Person : Well, my boyfriend made me come here.
ELIZA : Your boyfriend made you come here?
Person : He says I'm depressed much of the time.
ELIZA : I am sorry to hear that you are depressed.
Person : It's true. I am unhappy.
ELIZA : Do you think coming here will help you not to be
unhappy?
Person : I need some help, that much seems certain.
ELIZA : What would it mean to you if you got some help?
Person : Perhaps I could learn to get along with my mother.
ELIZA : Tell me more about your family
How does it work?
Heuristic rules based on
Keyword matching! E.g.,
I remember X =>
Do you often think of X?
always =>
Can you think of a specific example?
It keeps the dialogue going on,
but NEVER makes any claim!
A breakdown example:
Person : Failure is the mother of success.
ELIZA : Tell me more about your family
No NLP, but useful. Perhaps we should call this NLP?
Statistical NLP often has a similar flavor with “SOFT” rules LEARNED from data
18
Workaround Example II:
Statistical Translation
Learn how to translate Chinese to English from many example translations
Intuitions:
- If we have seen all possible translations, then we simply lookup
- If we have seen a similar translation, then we can adapt
- If we haven’t seen any example that’s similar, we try to generalize what we’ve seen
All these intuitions are captured through a probabilistic model
English
Speaker
P(E)
English
Words (E)
Chinese
Words(C)
Noisy
Channel
Translator
P(C|E)
P(E|C)=?
English
Translation
19
So, what NLP techniques are most
useful for text information systems?
Statistical NLP in general, and
statistical language models in particular
The need for high robustness and
efficiency implies the dominant use of
simple models (i.e., unigram models)
20
What You Should Know
•
•
•
•
NLP is the foundation of text information systems
– Better NLP enables better text management
– Better NLP is necessary for sophisticated tasks
But
– Bad NLP doesn’t mean bad text information systems
– There are often “workarounds” for a task
– Inaccurate NLP can hurt the performance of a task
The most effective NLP techniques are often statistical
with the help of linguistic knowledge
The challenge is to bridge the gap between imperfect
NLP and useful application functions
21