Introduction to Natural Language Generation Yael Netzer Department of Computer Science

Download Report

Transcript Introduction to Natural Language Generation Yael Netzer Department of Computer Science

Introduction to Natural Language
Generation
Yael Netzer
Department of Computer Science
Ben Gurion University
Outline
•
•
•
•
•
•
2
Introduction – what is NLG
Traditional architecture of NLG system
Statistical methods in NLG
FUF/SURGE
An example in Hebrew – the noun phrase
A statistical method for generation
Yael Netzer BGU
2001 ,6November
What is Natural Language
Generation (NLG)
NLG is the process of constructing natural
language outputs from non-linguistic inputs.
[VanLinden]
NLG is mapping some communication goal to
some surface utterance that satisfies the
goal. [Reiter & Dale]
3
Yael Netzer BGU
2001 ,6November
Aspects in NLG
• Theoretical and practical interests:
– Theoretical: modeling various depths of human
language representation and production.
– Practical: engineering human/computer
interfaces (computer as an author/authoring
aid).
4
Yael Netzer BGU
2001 ,6November
Systems for examples:
• NLG as an Author:
–
–
–
–
Weather reports (FoG)
Stock market descriptions
Museum artifacts descriptions (ILEX)
“Personal” letters to costumers (AlethGen)
• NLG as an author aid
• Integrated (partial) NLG uses:
– NLG in augmentative and alternative communication
– Summarization (integrate ‘cut and paste’ techniques
with generation)
– Machine Translation (generation from interlingua)
5
Yael Netzer BGU
2001 ,6November
Inputs of NLG systems
Formally, a system can be defined as a fourtuple: {k,c,u,d}
• k- knowledge source (tables of numbers,
knowledge representation lang.) domain
dependent, no generalizations.
• c - communicative goal: the consequence of
a given execution of the system (considering
appropriate information)
6
Yael Netzer BGU
2001 ,6November
NLG input spec. cont.
u - user model: characterization of the hearer
or intended audience for whom the text is to
be generated.
d - discourse history: previous interactions
between user and NLG controlling
anaphoric forms, preventing repetitions.
7
Yael Netzer BGU
2001 ,6November
The output for an NLG system
Any text conveying the communicative goal:
It can be a word like ``yes'' in a dialogue or a text consisting of many paragraphs in
other cases.
The output should be related to the medium:
web pages with hyperlinks, voice stream etc.
8
Yael Netzer BGU
2001 ,6November
Main (Pipeline) Architecture
• Content determination
– What information should be included in the text?
• Document structuring
– how to organize text
• Lexicalisation
– choosing particular words or phrases
• Aggregation
– composing chunks of info into sentences.
• Referring expression generation –
– what properties should be used in referring to an entity.
• Surface realization
– mapping underlying content of text to a grammatically
correct sentence that expresses the desired meaning.
9
Yael Netzer BGU
2001 ,6November
Content Determination
Content determination:
• The process of deciding what to say.
• No general rules - domain specific.
- what is important - what should always be
included, what is exceptional information, etc.
- Practically – constructs a set of messages from
the underlying data (entities, concepts and
relations).
10
Yael Netzer BGU
2001 ,6November
Document Structuring
Document Structuring:
imposing ordering and structure over the
information.
- conceptual grouping
- rhetorical relationships.
11
Yael Netzer BGU
2001 ,6November
Lexical choice
Lexical chooser:
• determining the particular words to be used to
express concepts and relations.
• complexity of coding vs. richer language.
– choosing content words: information is mapped from
conceptual vocabulary.
– LC should supply a variety of words, consider the user
model [precise vs. general description of weather
phenomenon], and account for pragmatic
considerations (formal vs. casual style).
12
Yael Netzer BGU
2001 ,6November
Aggregation
Aggregation - can be performed in various
stages:
– the planner: combines similar data.
– In lexicalization: aggregates some concepts into
one lexical element.
– Aggregations of sentences:
• The month was cooler than average. The month
was drier than average into The month was cooler
and drier than average
13
Yael Netzer BGU
2001 ,6November
Referring expression generation
Referring Expression Generation:
– an entity can be referred in many ways: initially,
subsequently, distinguishing, definite, pronouns.
• Proper names:
– ‫באר שבע‬
– ‫באר שבע בית הנגב‬
• Definite descriptions:
– The train that leaves at 10am
– The next train.
• Prounouns
– it
14
Yael Netzer BGU
2001 ,6November
Syntactic realizer
Syntactic Realizer: syntax and morphology.
– Most general, domain independent (but definitely
language dependent).
– Various Usage Scenarios
– Input to syntactic realization is not observable
• Input for syntactic realizers in NLG
– What knowledge is needed to prepare input?
– Who supplies this knowledge?
– Can we find a common abstraction, common across languages and
applications?
15
Yael Netzer BGU
2001 ,6November
Possible techniques for realizers
• Bi-directional grammar specification.
• Grammar specifications tuned for
generation.
• Templates
• Corpus statistics
16
Yael Netzer BGU
2001 ,6November
A note on bi-directional grammar
• Realization, in some aspects, is easier than
parsing: no need to handle the full range of syntax
that a human might use, no need to resolve
ambiguities, no need to recover ill-formed input.
• A bi-directional grammar, is, theoretically, a
possible elegant approach.
• However, most NLG systems use a generationoriented grammar
17
Yael Netzer BGU
2001 ,6November
Why not bi-directional?
• Output of NLU parser is very different from the
input to an NLG realizer.
• Not obvious that lexicalization is a part of the
realization.
• Practically, not easy to engineer large bidirectional grammars.
• And more: generation is the process of choices,
even to use ‘canned text’ when needed.
18
Yael Netzer BGU
2001 ,6November
Syntactic Realizer
• This work concerns Syntactic Realizers –
the grammar
• Input for grammar: lexicalized
representation of a phrase in various levels
of abstractions.
• Output of grammar: a grammatical string,
representing most accurately the info in the
input.
19
Yael Netzer BGU
2001 ,6November
The input question is:
Knowledge
base
20
Application
Content planner
And lexicon
Input??
Yael Netzer BGU
Syntactic
Realizer
2001 ,6November
FUF/SURGE - Implementation
• The grammar is written in FUF – Functional
Unification Formalism [Elhadad]
FD - a list of (att val)
val = atom\fd\path
Grammar: meta-FD: disjunction with ALT, control with
NONE, GIVEN, ANY.
All components in the generation process can be
implemented with this formalism.
21
Yael Netzer BGU
2001 ,6November
Requirements for a syntactic realizer
•
•
•
•
•
•
•
22
Mapping thematic structure onto syntactic roles.
Control of syntactic paraphrasing and alternations.
Provision of default for syntactic features.
Propagation of agreement features.
Selection of closed class words.
The imposition of linear precedence constraints.
The inflection of open class words.
Yael Netzer BGU
2001 ,6November
SURGE [Elhadad&Robin 96]
• Functional Grammar, HPSG and descriptive
studies of language
• Input for the grammar is a lexicalized
representation of a phrase (a clause, NP, AP).
• Minimal syntactic information in the input allows
isolating earlier stages of the process from
containing purely syntactic knowledge, it gives the
grammar paraphrasing power, and it is also useful
for multilingual application.
23
Yael Netzer BGU
2001 ,6November
Input for SURGE in general
• Each constituent has the feature cat which
determines which part of the grammar it will be
unified with.
• The representation of the clause is mostly
semantic: a process (in SFL terms) and its
participant. Paraphrasing can be done using one
feature, like focus
• The input of an NP uses mostly syntactic features.
• Paraphrases requires different input.
24
Yael Netzer BGU
2001 ,6November
An Example
((cat clause)
The girl was kissed by John.
John kissed the girl.
(tense past)
(focus {partic affected})
(process ((type material)
(agentless no)
(lex “kiss”)))
(participants
((agent ((cat proper)
(lex “John”)))
(affected ((cat common)
(lex “girl”))))))
25
Yael Netzer BGU
2001 ,6November