Transcript Natural Language Understanding
Natural Language Processing
(1) Zhao Hai
赵海
Department of Computer Science and Engineering Shanghai Jiao Tong University [email protected]
Outline
Course Goals
Course Schedule
Course Requirements
Overview
2020 年 4 月 25 日 6 时 45 分 2
Course Goals
Introduction to the
know-how
of NLP (
NLU)
, including research highlights, crucial technologies and application achievements; Providing a chance to train students for
reading
and
evaluating
new academic
papers
from an important international conference in related areas, such as ACL conference; Encouraging students to
present
and
discuss
their comments for the
papers
. Accomplish a practical NLP system through a course project.
2020 年 4 月 25 日 6 时 45 分 3
Course Schedule (1)
1. Overview (2 lhs = 2 lecture hours)
1.1 Natural Language Processing/Understanding 1.2 Different Levels of Language Analysis 1.3 Applied Approaches in NLU Systems 1.4 Applications of NLU 2020 年 4 月 25 日 6 时 45 分 4
Course Schedule (2)
2. Lexicons and Lexical Analysis (8 lhs)
2.1 Lexicon: A Language Resource 2.2 A Lexicon for English Words: WordNet 2.3 Generative Lexicon 2.4 Finite State Models and Morphological Analysis 2.5 Collocations 2.6 Statistical Inference: n-gram Models over Sparse Data 2020 年 4 月 25 日 6 时 45 分 5
Course Schedule (3)
3. Syntactic Processing (8 lhs)
3.1 Basic English Syntax 3.2 Grammars and Parsing 3.3 Features and Augmented Grammars 3.4 Grammars for Natural Language 3.5 Toward Efficient Parsing 2020 年 4 月 25 日 6 时 45 分 6
Course Schedule (4)
4. Learning Approaches for Natural language processing (8 lhs)
4.1 Main machine learning approaches Maximum entropy K-nearest neighbor Support vector machine 4.2 Sequence labeling: HMM, Maximum Entropy Markov Model and CRFs 4.3 A Case Study: train a Part-of-speech tagger from labeled corpus 2020 年 4 月 25 日 6 时 45 分 7
Course Schedule (5)
5. Course Discussion (1 lh)
5.1 Discussion for given Course Content 5.2 How to Prepare for the Paper Reading 5.3 Other Related Issues
6. Students Workshop (4 lh)
6.1 ACL/EMNLP Paper Reading Groups 6.2 Presentation and Discussion 2020 年 4 月 25 日 6 时 45 分 8
Course Requirements (1)
1. Texts and References
James Allen.
Natural Language Understanding (The Second Ver.)
. The Benjamin / Cummings Publishing Company, Inc., 1995.
Christopher D. Manning and Hinrich Schütze.
Foundations of Statistical Natural Language Processing.
The MIT Press. Springer-Verlag, 1999.
2020 年 4 月 25 日 6 时 45 分 9
Course Requirements (2)
2. Online Literatures
ACL Anthology http://www.aclweb.org/anthology-new/ Other Related References.
2020 年 4 月 25 日 6 时 45 分 10
Course Requirements (3)
3. Final Grade
Attendance and Assignments ACL/EMNLP Paper Summary, 30% Comment and Presentation 30% Course project 40% 2020 年 4 月 25 日 6 时 45 分 11
Course Requirements (4)
4. Curriculum Schedule Time
: The 3 th and 4 th classes, Tuesday morning, The 7 th and 8 th classes, Thursday afternoon, The 9 st -16 th week;
Location
: East Up Building 403 2020 年 4 月 25 日 6 时 45 分 12
Course Requirements (5)
5. Web Site and Contact Email
http://bcmi.sjtu.edu.cn/~zhaohai/nlp4ieee2014/index.html
TA Xin Yang [email protected]
2020 年 4 月 25 日 6 时 45 分 13
Overview (1)
Natural Language Understanding (1)
What is Natural Language?
It means
human
language. (vs. formal language, programming language) The most common way that people communicate is by speaking or writing in one of the natural language such as English, Chinese, German, or French.
There are two forms of natural language:
written
and
spoken
forms.
Text and speech 2020 年 4 月 25 日 6 时 45 分 14
Overview (2)
Natural Language Understanding (2)
NLP & NLU (1) NLP (Natural Langauge Processing) sums up
all
methods covering the
pure
processing of language by means of algorithmic, statistic, heuristic etc. means. NLU (Natural Langauge Understanding) indicates the
real
understanding of a text that is formulated in some natural languages. Semantic or pragmatic issue?
2020 年 4 月 25 日 6 时 45 分 15
Overview (3)
Natural Language Understanding (3)
NLP & NLU (2) Information Retrieval (IR) Information Extraction (IE) Parsing) Summarization Parsing) NLP with NLU (Shallow and Deep Question Answering (QA) NLP NLP with NLU (Shallow NLP with NLU (Shallow or Deep Parsing) Machine Translation (MT) NLP with NLU (Deep Parsing) Natural Language Generation (NLG) NLP … 2020 年 4 月 25 日 6 时 45 分 16
Overview (4)
Natural Language Understanding (4)
Computational Linguistics (1) Research in Computational Linguistics, the use of
computers
in the study of
languages
, – started soon after computers became available in the 1940’s. – This discipline, along with AI discipline and so on, promoted the progress of NLU. – Linguistics who also focuses on computational issues.
2020 年 4 月 25 日 6 时 45 分 17
Overview (5)
Natural Language Understanding (5)
Computational Linguistics (2) Engineering Science Computational Linguistics Bioscience Psychology AI Cognitive Science Computer Science Philosophy Linguistics 2020 年 4 月 25 日 6 时 45 分 18
Overview (6) Natural Language Understanding (6)
Why is NLU a Difficult Task? (1)
Complexity of the target representation into which the matching is being done
In fact, the procedure of understanding natural language is to
transform
it from one
representation
into another. Extracting meaningful information of source representation often
requires
the use of additional
knowledge
.
2020 年 4 月 25 日 6 时 45 分 19
Overview (7)
Natural Language Understanding (7)
Why is NLU a Difficult Task? (2)
Type of mapping
There are
one-to-one
,
many-to-one
,
one-to-many
, or
many-to many
mappings. One-to-many mappings require a great deal of
domain knowledge
beyond the input to make the correct choice among target representations.
For example (one-to-many): a) a tall giraffe vs. b) tall the truth 2020 年 4 月 25 日 6 时 45 分 20
Overview (8)
Natural Language Understanding (8)
Why is NLU a Difficult Task? (3)
Level of interaction of the components of the source representation
In many natural language sentences, changing a single word can
alter
the
interpretation
of the entire structure.
As the number of interactions increases, so does the complexity of the mapping.
2020 年 4 月 25 日 6 时 45 分 21
Overview (9)
Natural Language Understanding (9)
Why is NLU a Difficult Task? (4)
Modifier attachment problem
The sentence
Give me all the employees in a division making more than $50,000
doesn't make it clear whether the speaker wants all employees making more than $50,000, or only those in divisions making more than $50,000.
2020 年 4 月 25 日 6 时 45 分 22
Overview (10)
Natural Language Understanding (10)
Why is NLU a Difficult Task? (5)
Quantifier scoping problem
In logic, some words such as “the”, “each”, or “what” that express “
universal
” ( ) or “
existential
” ( ). They can have several readings.
Elliptical utterances
The interpretation of a query may depend on previous queries and their interpretations. E.g., asking
Who is the manager of the automobile division
and then saying,
of aircraft?
2020 年 4 月 25 日 6 时 45 分 23
Overview (12)
Natural Language Understanding (12)
Machine Translation (1) In 1949, Warren Weaver proposed that computers might be useful for “the solution of world-wide translation problems”.
However, even after more than 50 years of effort, current systems still produce output of limited quality, which is suitable for
assimilation
of foreign-language documents, but not for the
production
of publishable material.
2020 年 4 月 25 日 6 时 45 分 24
Overview (13)
Natural Language Understanding (13)
Machine Translation (2) By practice, the researchers have realized that human language translation is a
complex cognitive ability
involving knowledge of different kinds: the structure of sentences; the meaning of words; a model of the listener (user model); the rules of conversation (dialogue translation); an extensive shared body of general information about the world.
2020 年 4 月 25 日 6 时 45 分 25
Overview (14)
Natural Language Understanding (14)
Machine Translation (3) Some forms of translation for information access is already today available in the web at
no cost
. e.g. http://translate.google.com/?hl=zh-CN&tab=wT#auto|en| http://fanyi.baidu.com/ The increasing demand for these services will give a push to improve their quality; The translation providers will find ways to increase
vocabularies
and
translation quality
semi-automatically from terminological resources, bilingual corpora and similar sources. 2020 年 4 月 25 日 6 时 45 分 26
Overview (16)
Natural Language Understanding (16)
Investigation Goals AI researchers in natural language processing expected their work to lead both to: the development of
practical
,
useful
language understanding systems and a
better
understanding of language and the nature of intelligence.
2020 年 4 月 25 日 6 时 45 分 27
Overview (17)
Different Levels of Language Analysis (1)
Six Analysis Levels for Written Texts Morphological Analysis (Lexical Analysis) Syntactic Analysis (Deep & Shallow Parsing) Semantic Analysis Pragmatic Analysis Discourse Analysis (Text Analysis) World Knowledge Analysis (
is it possible?)
2020 年 4 月 25 日 6 时 45 分 28
Overview (18)
Different Levels of Language Analysis (2)
Morphological Analysis (1) It is the identification of a word-stem from a full word-form (and sometimes also the identification of the syntactic category of the stem). For example, the word
friendly
is combined by the noun (stem)
friend
and the suffix
-ly
, which transforms a noun into an adjective. 2020 年 4 月 25 日 6 时 45 分 29
Overview (19)
Different Levels of Language Analysis (3)
Morphological Analysis (2) Most systems that analyze natural language text typically start by segmenting the text into meaningful tokens. In general, this procedure includes
tokenization
(
segmentation
),
normalization
(
stemming
) ,
POS
(
part-of speech
)
tagging
,
named entity / phrase identification
.
2020 年 4 月 25 日 6 时 45 分 30
Overview (20)
Different Levels of Language Analysis (4)
Syntactic Analysis (1) Its goal is to break down given textual units, e.g. sentences, into
smaller constituents
, to assign
categorical labels
to them, and to identify the
grammatical relations
between the various parts.
In most parsers, the grammar is
separated
from the processing components. The grammar consists of a
lexicon
, and
rules
that syntactically and semantically combine words and phrases into larger phrases and sentences.
2020 年 4 月 25 日 6 时 45 分 31
Overview (21)
Different Levels of Language Analysis (5)
Syntactic Analysis (2) The output of a
shallow parser
is less complete than that from a deep or full parser, that is, it is not a phrase-structure tree. A shallow parser may identify some phrasal constituents, such as noun phrase, without indicating their internal structure and their function in the sentence. It has the advantages of
efficiency
and
robustness
.
2020 年 4 月 25 日 6 时 45 分 32
Overview (23)
Different Levels of Language Analysis (7)
Semantic Analysis (1) The goal of semantic analysis is to assign meanings to
utterances
whose meaning is complete, containing word meaning and combination of word meaning, which is a context independent meaning. 2020 年 4 月 25 日 6 时 45 分 33
Overview (24)
Different Levels of Language Analysis (8)
Semantic Analysis (2) The task of semantic analysis can be divided into several subtasks, depending on the linguistic level where it takes place. The most important subtasks are the
semantic tagging of ambiguous words and phrases
, and the
resolution of referring expressions
.
2020 年 4 月 25 日 6 时 45 分 34
Overview (25)
Different Levels of Language Analysis (9)
Pragmatic Analysis It depicts the
relationships
between the symbols of texts (
talks
) and the producers /
users
. Note that here those present writers / readers and speakers / hearers. In other words, the
context of situation
has significant impact for the
interpretation of a discourse
. 2020 年 4 月 25 日 6 时 45 分 35
Overview (26)
Different Levels of Language Analysis (10)
Discourse Analysis Extracting the knowledge contained in texts requires more than the resolution of
local
semantic ambiguities. Discourse analysis needs to consider the global argumentative structure of texts. In addition, it also analyzes the relationships between sentences in a text. This analysis is especially important for
pronoun
and
temporal constituents
.
2020 年 4 月 25 日 6 时 45 分 36
Overview (27)
Different Levels of Language Analysis (11)
World Knowledge Analysis It analyzes and infers the
general
world knowledge that each language users must have, e.g. other user’s beliefs and goals in a conversation.
2020 年 4 月 25 日 6 时 45 分 37
Overview (28)
Different Levels of Language Analysis (12)
Examples Consider each example below as a candidate for the initial sentence of the book concerning
natural language processing
: 1. Green frogs have large noses. [ grammar√ semantics √ ] 2. Green ideas have large noses. [ grammar√ semantics X ] 3. Large have green ideas nose. [ grammar X semantics X ] 2020 年 4 月 25 日 6 时 45 分 38
Overview (29)
Applied Approaches in NLU Systems (1)
Historical Categories Borrowed from Winograd (1972), groups NLU approaches according to how they
represent
and use
knowledge
of their subject matter. On this basis, they can be divided into four
historical
categories.
2020 年 4 月 25 日 6 时 45 分 39
Overview (30)
Applied Approaches in NLU Systems (2)
Historical Categories
The earliest approach
with limited results in specific, constrained domains (BASEBALL, SAD-SAM, STUDENT and ELIZA);
Text-based approach
(PROTOSYNTHEX-I and Semantic Memory);
Limited logic-based approach
(SIR, TLC, DEACON and CONVERSE);
Knowledge-based approach
(LUNAR, SHRDLU, MARGIE, SAM and LIFER). 2020 年 4 月 25 日 6 时 45 分 40
Overview (31)
Applied Approaches in NLU Systems (3)
BASEBALL [Bert Green, 1963] An information retrieval program with a large database of facts about all American League games over a given year. It accepted input questions from the user, limited to one clause with no logical connectives.
2020 年 4 月 25 日 6 时 45 分 41
Overview (32)
Applied Approaches in NLU Systems (4)
SAD-SAM [Lindsay, 1963]
S
yntactic
A
ppraiser and
D
iagrammer --
S
emantic
A
nalyzing
M
achine. Programmed by Robert Lindsay in 1963 at CMU. It uses a basic English vocabulary (1,700 words) and follows a context-free grammar. It parses input from left to right, builds derivation trees, and passes them to SAM, which extracts the semantically relevant information to build family trees and find answers to questions.
2020 年 4 月 25 日 6 时 45 分 42
Overview (33)
Applied Approaches in NLU Systems (5)
ELIZA [Weizenbaum, 1966] It was built at MIT in 1966 and was the most famous pattern matching natural language system. The system assumes the role of a Rogerian, or “nondirective”, therapist in its dialog with the user. It operated by matching the
left
user’s
last
sides of its rules against the sentence, and using the appropriate
right
side to generate a
response
. Rules were indexed by keywords so only a few had to be matched against a particular sentence. Some rules had no left side, so they could apply
anywhere
.
2020 年 4 月 25 日 6 时 45 分 43
Overview (34)
Applied Approaches in NLU Systems (6)
ELIZA: Sample Data
Word Rank Pattern Outputs
alike 10 ?X In what way?
What resemblance do you see?
are 3 ?X are you ?Y Would you prefer it if I weren’t ?Y?
3 ?X are ?Y What if they were not ?Y?
always 5 ?X Can you think of a specific example?
When?
Really, always?
what 2 ?X Why do you ask?
Does that interest you?
2020 年 4 月 25 日 6 时 45 分 44
Overview (35)
Applied Approaches in NLU Systems (7)
ELIZA: A Dialogue
User: ELIZA:
Men are all alike . In what way?
They’ re always bugging us about something or other. Can you think of a specific example?
Well, my boyfriend made me come here. Your boyfriend made you come here.
He says I’m depressed much of the time. I am sorry to hear you are depressed. It’s true, I am unhappy. Do you think coming here will help you not to be unhappy?
I need some help; that much seems certain. What would you mean to you if you got some help?
Perhaps I could learn to get along with my mother. Tell me more about your family.
My mother takes care of me. Who else in your family takes care of you?
My father. Your father.
You are like my father in some days. What resemblance do you see?
2020 年 4 月 25 日 6 时 45 分 45
Overview (36)
Applied Approaches in NLU Systems (8)
SIR [Bertram Raphael, 1968]
S
emantic
I
nformation
R
etrieval System, it was a prototype “understanding” machine, since it could accumulate facts and then make deductions about them in order to answer questions.
2020 年 4 月 25 日 6 时 45 分 46
Overview (37)
Applied Approaches in NLU Systems (9)
LUNAR [William Woods, 1973] (1) LUNAR answered questions about the rock samples brought back from the moon using two databases -- the chemical analyzes and the literature references. Specifically, it helped geologists access, compare, and evaluate chemical analysis data on moon rocks and soil composition obtained from the Apollo-11 mission. 2020 年 4 月 25 日 6 时 45 分 47
Overview (38)
Applied Approaches in NLU Systems (10)
LUNAR [William Woods, 1973] (2) It operated by translating a question entered in English into an expression in a formal query language. The translation was done with an ATN parser coupled with a rule-driven semantic interpretation procedure. 2020 年 4 月 25 日 6 时 45 分 48
Overview (39)
Applications of NLU (1)
Text-Based Applications Finding appropriate documents on certain topics from a database of texts; Extracting information from messages or articles on certain topics; Translating documents from one language to another; Summarizing texts for certain purposes.
2020 年 4 月 25 日 6 时 45 分 49
Overview (40)
Applications of NLU (2)
Dialogue-Based Applications Question-answering systems, where natural language is used to query a database; Automated customer service over the telephone; Tutoring systems, where the machine interacts with a student; General cooperative problem-solving systems.
2020 年 4 月 25 日 6 时 45 分 50
Overview (41)
CL Research Topics (1)
Call for Papers from ACL- 2010 (1) Discourse, dialogue, and pragmatics Grammar engineering Information extraction Information retrieval Knowledge acquisition Large scale language processing Language generation Language processing in domains such as bioinformatics, legal, medical, etc.
Language resources, evaluation methods and metrics, science of annotation Lexical/ontological/formal semantics Machine translation Mathematical linguistics, grammatical formalisms Mining from textual and spoken language data 2020 年 4 月 25 日 6 时 45 分 51
Overview (42)
CL Research Topics (2)
Call for Papers from ACL-2010 (2) Multilingual language processing Multimodal language processing (including speech, gestures, and other communication media) NLP applications and systems NLP on noisy unstructured text, such as emails, blogs, sms Phonology/morphology, tagging and chunking, word segmentation Psycholinguistics Question answering Semantic role labeling Sentiment analysis and opinion mining Spoken language processing Statistical and machine learning methods Summarization Syntax, parsing, grammar induction Text mining Textual entailment and paraphrasing Topic and text classification Word sense disambiguation 2020 年 4 月 25 日 6 时 45 分 52
Overview (43)
CL Research Topics (3)
Accepted Regular Paper Statistics for JSCL-2005 (formal CCL) lexical, syntactical, semantic and discourse analysis, 24 papers, 29.3%; resource building and related techniques, 12 papers, 14.6%; machine translation techniques, system and evaluation, 8 papers, 9.7%; intelligent retrieval, 30 papers, 36.6%; others, 8 papers, 9.7%. 2020 年 4 月 25 日 6 时 45 分 53