Parsing Algorithms 1 - University of Malta

Download Report

Transcript Parsing Algorithms 1 - University of Malta

CSA3180: Natural Language
Processing
Semantics I – Truth Conditions, FOL, Quantified
Sentences, XML and Taxonomies
• Truth Conditions and First Order Logic
• Quantified Sentences
• Translating English into FOL and vice-versa
• XML in an NLP context
• Semantic Web
• Taxonomies
November 2005
CSA3180: Semantics I
1
Introduction
• Quantification and FOL/English translation
slides partly based on Introduction to Logic
Lectures by Angelo Dalli given in 2000
• Quotes from W3C website and NLPRS
2001 Tokyo
• Will introduce the concepts of linking
semantics to syntactic objects
• Taxonomies and the use of XML in an NLP
context
November 2005
CSA3180: Semantics I
2
Quantification
Prepositional Logic addresses shortcomings of
Propositional Logic mainly by introducing
predicates.
Atomic or Compound Propositional statements like
“This whiteboard is white” do not allow us to get
to more generic/lower level concepts, like “You
can write on all whiteboards”
November 2005
CSA3180: Semantics I
3
Propositional to Predicate
Propositional logic uses the notion of variables.
Variables are used as placeholders that indicate
relationships between quantifiers and argument
positions of predicates.
So apart from statements like father(Max) and
mother(Claire) we can have father(X) and
mother(Y).
November 2005
CSA3180: Semantics I
4
Propositional Logic
Propositional logic is thus similar to algebra
using constants only (like 1+(2/3)), while
prepositional logic uses variables (like
x+(y/z)).
November 2005
CSA3180: Semantics I
5
Variables
•
Variables are named symbolically - a,b,c. In Prolog they
usually start with an uppercase letter.
• Variables can appear in argument lists ex. big(i)
• Variables can appear in place of constants, ex.
student(x)  noisy(x)
• With the help of variables we can produce wffs - man(x),
mortal(x)
November 2005
CSA3180: Semantics I
6
Formulae vs. Sentences
A formula like man(x) is not a sentence because it does not
make an identifiable claim. To make such claims we
require quantifiers in order to actually bind the
variables (in this case ‘x’)
Examples of an atomic wff:


cube(a)  big(a)  green(a)
doctor(x)  expensive(x)
Examples of FOL which we would like to represent:

All green cubes are green

Some doctors are expensive
November 2005
CSA3180: Semantics I
7
Quantifiers
A need to use quantifiers has therefore been argued due to
the lack of expressiveness of Propositional logic and
also to represent better FOL wffs in Predicate logic.
Quantifiers tell us about the number or quantity of things
that satisfy some of the conditions within the scope of
the quantifier.
They are also used to help bind variables to values within a
universe of discourse.
The universe of discourse is the domain of the
interpretation under consideration, or, more formally, ‘the
set of individual objects which we are discussing now’.
November 2005
CSA3180: Semantics I
8
UNIVERSAL Quantifier

The first of the two quantifiers is the :
“for all” or “for every” or “for ever ”
The domain of the  quantifier when we say (x) includes all those objects
that can take up the value of ‘x’ in the universe of discourse - all have to
bind
The scope of  when we state
(x)(is_integer(x)  has_prime_fac(x))
is obviously equivalent exactly to
(y)(is_integer(y)  has_prime_fac(y))
However, the following is not possible
(x)(is_integer(x)  has_prime_fac(y))
November 2005
CSA3180: Semantics I
9
UNIVERSAL Quantifier
E.g.1.

Every (all) student is noisy
That is, for all x,
if x is a student,
then x is noisy.
For all x, (student(x)  noisy(x))
(x)(student(x)  noisy(x))
E.g.2.
All men are mortal.
Socrates is a man.
Therefore Socrates is mortal
For all y, (is_a_man(y)  is_mortal(y))
(y)(is_a_man(y)  is_mortal(y))
November 2005
CSA3180: Semantics I
10
EXISTENTIAL Quantifier

The second quantifier is the existence, meaning “there exists” or “there
xists” at least on object in the domain that binds with the variable to
satisfy the wff.
The scope of , that is, the part of the formula to which it applies, is the
same as , exactly where the variable is bound to some value or
object within the domain of discourse.
So, in this case the use of brackets is very important, as seen in this
example:
x  y (y = 2x)
Is it O.K. if:
 y x (y = 2x)
More about scope in ‘Free vs Bound’ slide.
November 2005
CSA3180: Semantics I
11
EXISTENTIAL Quantifier

E.g.1. Some persons never learn.
That is, there exists at least one x,
if x is a person,
then x will never learn.
 (x) (person(x)  never_learns(x))
E.g.2. Some footballers will never play in the Premier or First division.
Reformulating, there exists y such that y is a footballer and y will not play in the
premier or first division.
There exists at least on person, y, who
ftball(y)  ~ (prem(y)  div1(y)
 (y) (ftball(y)  ~ (prem(y)  div1(y))
November 2005
CSA3180: Semantics I
12
Free vs. Bound Variables
If P is a wff and ‘v’ is a variable, then:
v P and v P are wff too and ‘v’ is bound in P.
E.g. x (student(x)  noisy(x))
‘x’ is bound within the scope of the 
A variable which is not bound in P is said to be unbound or
free in P.
E.g.  x student(x)  noisy(y)
‘y’ is unbound within the scope of 
A sentence is a wff with NO unbound variables.
November 2005
CSA3180: Semantics I
13
Points to Remember
• Quantified sentences make claims about some intended domain of
discourse.
• A sentence of the form is x P(x) is TRUE iff the wff P(x) is
satisfied by every object in the domain of discourse.
• A sentence of the form is x P(x) is TRUE iff the wff P(x) is
satisfied by some object (at least one) in the domain of discourse.
November 2005
CSA3180: Semantics I
14
Translating Quantified Sentences
•
•
•
•
•
•
•
•
•
 is often used in sentences like the following:
Every P is a Q
x (P(x)  Q(x))
While  is normally used as follows:
There is a P which also has property Q.
x (P(x)  Q(x))
It is often tempting to translate the latter sentence as:
x (P(x)  Q(x))
but this means something rather different, being true just
in case there is an object which is either not a P or else
is a Q; in particular, it is true when there is no object
satisfying P(x).
November 2005
CSA3180: Semantics I
15
Vacuously True Sentences
• Suppose we try to evaluate the sentence:
• x (student(x)  noisy(x))
• in a world where there are no students. Nobody will
satisfy the first part (student(x)) and so from the truth
table for implication, all the possible instances come out
True - hence the universal statement holds.
• From this we can conclude that any sentence of the
form:
• x (P(x)  Q(x))
• is vacuously true in a world where the first part of the
universal statement does not hold.
November 2005
CSA3180: Semantics I
16
Complex Noun Phrases
• Most of the time we use  to translate sentences with “every” or
“all”.
• Every small dog that is at home is happy.
• x (small(x)  dog(x)  at_home(x)  happy(x))
• and we use  to translate sentences involving “a”.
• A small happy dog is at home.
• x (small(x)  happy(x)  home(x))
• However, sometimes “a” has also a universal sense, as in:
• A dog is a kind mammal.
• x y (dog(x)
•  kind_of(x,y)  mammal(y))
November 2005
CSA3180: Semantics I
17
Quantifier Equivalence
•
If it is a known fact that not everything has some
property, then it follows that there is something that does
not have that property.
• Symbolically, ~x P(x)  x ~P(x)
• Similar to ~(AB…)  (~A~B…)
• ~(P(x1)P(x2)...)  (~P(x1)~P(x2) …)
•
Similarly, if it is a known fact that it’s not the case that
something has a property, then all things do not have
that property.
• Symbolically, ~x P(x)  x ~P(x)
• Similar to ~(AB…)  (~A~B…)
• ~(P(x1)P(x2)...)  (~P(x1)~P(x2)…)
November 2005
CSA3180: Semantics I
18
Multiple Quantifiers
•
Some cube is to the left of some tetrahedron.
x y (cube(x)  tet(y)  leftof (x,y))
Precisely expressing the logical formula as an English
sentence reading from left to right:
‘There exists x, there exists y, such that x is a cube, y is a
tetrahedron and x is on the left of y’
• All cubes are to the left of all tetrahedrons.
xy
((cube(x) tet(y))  leftof(x,y))
‘For all x, for all y, if x is a cube and y is a tetrahedron, then x
is to the left of y’
November 2005
CSA3180: Semantics I
19
Prenex Form
• When translating from English to FOL
quantifiers and connectives usually end up
mixed together.
• In prenex form all quantifiers are put at the
start of the sentence, followed by a wff that
is quantifier-free.
Q1v1Q2v2…Qnvn P
• Where every Qi is either  or , each vi is
a variable and P is quantifier-free wff.
November 2005
CSA3180: Semantics I
20
Restrictions and Sets
Restricted quantifiers – quantifiers that are restricted to
some set membership.
Ex. If P(x) denotes the predicate that is true when x is a
person. Thus the set P generated by P(x) is the set of all
persons.
This is denoted formally by (x)P
Alternatively you can define P(x) and then say that x  P.
Then you can simply write down x
November 2005
CSA3180: Semantics I
21
Restrictions and Sets
P(x)
generates
P, which is
the set of
all people
November 2005
(x)P
CSA3180: Semantics I
22
FOL to English Translation
• Two main steps:
• 1. Translate the formula by writing the literal
meanings of the logical symbols and predicates
as they occur.
• 2. Reword the sentence so that it has the
same logical meaning (the truth or falsity of the
sentence should not change) but is written in
more ‘acceptable’ English. This actually involves
avoiding the use of variable names.
November 2005
CSA3180: Semantics I
23
Alternative Notations
Course Notation
Alternative Notations
P
~P, !P, P, Np
PQ
P&Q, P&&Q, P.Q, PQ,
Kpq
PQ
P|Q, P||Q, P+Q, Apq
PQ
PQ
X Y
P  Q, Cpq
November 2005
P  Q, Epq
X,Y
CSA3180: Semantics I
24
Some simple exercises…
Let
van(x)
car(x)
bike(x)
exp(x,y)
faster(x,y)
represent
‘x is a van’,
represent
‘x is a car’,
represent
‘x is a bike’,
‘x is more expensive y’,
‘x is faster than y’.
Translate the following formula into natural language:
1. x bike(x)y (car(y)  exp(y,x)
(
)
2.xy ((van(x)  bike(y))  faster(x,y))
3. z (car(z)  xy((van(x) bike(y)) 
(faster(z,x)faster(z,y)exp(z,x)exp(z,y))))
November 2005
CSA3180: Semantics I
25
English to FOL Translation
• Inverse translation is much more
challenging. Three main steps:
• Identify predicates in the sentence.
• Rearrange the sentence into a logical
formulation. Capture the essential
meaning of the sentence using predicates,
quantifiers and connectives.
• Cater for expressions involving time such
as ‘always’, ‘afterwards’, etc.
November 2005
CSA3180: Semantics I
26
Some more simple exercises…
•
Translate the following natural language
statements into predicate logic:
1. Every school boy thinks that Robin Hood is a
hero.
2. Some people will never learn to keep their
mouth shut or to respect other people.
3. A person’s mother is always older than that
same person.
November 2005
CSA3180: Semantics I
27
eXtensible Markup Language (XML)
November 2005
CSA3180: Semantics I
28
eXtensible Markup Language (XML)
• Universal structured data representation
language
• Framework for web publishing
• E-Commerce Applications (B2B/B2C)
• “Point of Creation” Bottleneck – people
are lazy!
• Too time consuming to markup NLP texts
manually
November 2005
CSA3180: Semantics I
29
eXtensible Markup Language (XML)
• NLP applications should help in
automatic markup of texts using XML
• Gives back much richer text structure and
documents
• Intelligence to documents
• Disambiguation and search functionalities
November 2005
CSA3180: Semantics I
30
Semantic Web
The Semantic Web provides a common framework that
allows data to be shared and reused across
application, enterprise, and community boundaries. It is
a collaborative effort led by W3C with participation from
a large number of researchers and industrial partners.
It is based on the Resource Description Framework
(RDF), which integrates a variety of applications using
XML for syntax and URIs for naming.
"The Semantic Web is an extension of the current web in
which information is given well-defined meaning, better
enabling computers and people to work in
cooperation." -- Tim Berners-Lee, James Hendler, Ora
Lassila, The Semantic Web, Scientific American, May
2001
November 2005
CSA3180: Semantics I
31
Semantic Web
• Next generation Web?
• http://www.w3.org/2001/sw/
• Many small applications, lots of hype, few
large spread uses
• Most notable: RDF/RSS/Atom for blogs
and news syndication (also for
podcasting)
November 2005
CSA3180: Semantics I
32
NLP for XML (NLPRS 2001)






Ontology extraction into XML based structured
languages using XML Schema
Message Translation for multilingual B2B, B2C ecommerce applications
Automatic XML to XML schema mapping by XML
vocabulary translators with morphological analyzers
Web (XHTML) resource discovery and indexing
Automatic hyperlink (XLink) generation
Multimodal techniques to take advantage of XML
compound documents (e.g. search the key string in
XHTML, MathML, SVG and SMIL components at the
same time)
November 2005
CSA3180: Semantics I
33
XML for NLP (NLPRS 2001)

NL Corpora representation languages and the conversions
among them, from and to RDB, and from raw text

XML based Machine Translation / Interlingua

XML based multilingual Web contents management system

Tree transducers implemented by XSLT

IR powered by both NLP and XML

Task-oriented Summarization using XML Schemas

VoiceXML applications and the dialogue scenario generation

Foreign language e-Education (CALL) material (texts, drills,
grading systems etc.) generation by XML
November 2005
CSA3180: Semantics I
34
Taxonomies
Taxonomy (from Greek ταξινομία (taxinomia)
from the words taxis = order and nomos =
law) may refer to either the classification of
things, or the principles underlying the
classification. Almost anything, animate
objects, inanimate objects, places, and
events, may be classified according to some
taxonomic scheme.
Wikipedia Definition
November 2005
CSA3180: Semantics I
35
Taxonomies/Ontologies

Used to markup texts

Define XML tags (or SGML) used to markup
semantic objects

Example: Use <noun> tag to markup “nouns”

Frequently hierarchical

Confusion with Ontologies – often referring to same
thing (ontologies used more in Knowledge
Management)

Ontologies seen sometimes as being broader in
scope than taxonomies
November 2005
CSA3180: Semantics I
36
Scientific vs. Folk

Scientific taxonomies:




Example: Biological Taxonomy
(Linnaean/Evolutionary Tree)
Folk taxonomies:





Objective
Universal
Subjective
Vernacular naming system
Social knowledge representation
Example: Flickr, del.icio.us, podcast labels
More or less the same thing as folksonomies
November 2005
CSA3180: Semantics I
37
Taxonomies/Ontologies
 Formally represent an acyclic graph/tree
 XML or SGML frequently used as base
language
 Prolog can also be used (80’s AI projects)
 FOL can also be used (Cyc)
 Modern standards: OWL, RDF, RDFS, OIL,
DAML, DAML+OIL
 Welcome to acronym world!
November 2005
CSA3180: Semantics I
38
Stuff to lookup
 RDF, DAML+OIL
 RSS
 Podcasting – behind the scenes
 (Non-comprehensive) List of NLP-related
projects using ontologies
 http://www.cs.utexas.edu/users/mfkb/related.
html
November 2005
CSA3180: Semantics I
39