Document 7906948

Download Report

Transcript Document 7906948

TextNet – A Text-Based
Intelligent System
Sanda Harabagiu
Dan Moldovan
as (mis-)interpreted by Peter Clark
Introduction
• Overall goal:
– Given a sentence/paragraph, create a representation of
the unstated, extra knowledge (“context”) which it
suggests.
– Input: sentence graph; Output: bigger, richer graph
• Purpose: Question-answering etc. (?)
• Sources of this extra knowledge:
– (Extended) WordNet
– the Internet
WordNet
• Organized around concepts (“synsets”), not words
• Contains:
– ~100k concepts (“synsets”)
– ~350k connections (14 types)
– English definitions (“glosses”) for most synsets
{“athletic game”}
132132
“Game involving athletic activity.”
isa
{“tennis”,
“lawn tennis”}
433243
“A game played with rackets by twp or
four players who hit the ball over a net
that divides the court.”
WordNet
• Organized around concepts (“synsets”), not words
• Contains:
– ~100k concepts (“synsets”)
– ~350k connections (14 types)
– English definitions (“glosses”) for most synsets
{“athletic game”}
athletic game
“Game involving athletic activity.”
isa
{“tennis”,
“lawn tennis”}
tennis
“A game played with rackets by twp or
four players who hit the ball over a net
that divides the court.”
Extended WordNet
• Disambiguate and transform glosses into network
representations.
“Tennis court: A court in which tennis is played.”
tennis court
def
court location-of play
object
tennis
{“tennis”,
“lawn tennis”}
Extended WordNet
• Disambiguate and transform glosses into network
representations.
“Serve: A stroke in tennis that puts the ball in play.”
serve
def
stroke
agent
object
context
tennis
ball
put
manner
play
Extended WordNet
• Resulting structure is no longer just a big graph
Original WordNet
Processed Glossary Definitions
def
ball
“Raw” concepts
(isa hierarchy,
other relations)
ball
def
Concepts in context
(particular subtypes/
situations for concepts)
Part I: Adding Relevant, Contextual
Knowledge from WordNet
“The kid hit the ball very hard.”
hit
agent
kid
object
ball
manner
hard
“Inference Extraction”
“The kid hit the ball very hard.”
hit
agent
• Goals:
kid
object
ball
manner
hard
– provide supplementary information about a sentence
– explain relation between sentences
• Approach:
– Deductive inference (e.g., “snore –entails sleep”)
– Find and add information into the sentence representation
• Challenge:
– Many possible connections
Path-finding
To find path(s) between A and B:
• use spreading activation/marker passing:
– place markers at A and B
– propogate markers to neighboring nodes
– at quiescence, look for marker collisions
• “Propogation rules” determine when to propogate
– “asymmetric and transitive relations are more useful”
– “going up the isa hierarchy allows hierarchical deductions”
– “the same is true for relations such as entail and causation.
For example, if a man is snoring, then he is sleeping, and
further he is temporarily unconscious.”
“The kid hit the ball very hard.”
hit
agent
kid
object
manner
ball
hard
• Find connections which “explain” these relations
within context of ball
hit
context
game
within context of tennis
object-of
within context of ball
hit
context
game
object-of
play
agent
player
isa
person
isa
kid
within context of tennis
play
agent
player
agent-of
hit
object
ball
“The kid hit the ball very hard.”
hit
agent
kid
object
manner
ball
hard
• Find connections which “explain” these relations
within context of drive
hard
manner-of
return
within context of return
gloss
(“isa”)
stroke
context
tennis
gloss
(“isa”)
within context of tennis
game
object-of
play
agent
player
agent-of
hit
Inter-sentential Global Context
• Find connections between “local contexts”
S1: The kid hit the ball very hard.
S2: It landed almost always near the baseline.
within context of move
hit
isa
move
gloss
(“isa”)
change
object
location
isa
within context of destination
within context of arrive
place
gloss
(“isa”)
destination
object
reach
gloss
(“isa”)
arrive
isa
land
Part II: Adding Contextual Knowledge
from the Internet
Is WordNet (or a dictionary)
sufficient to fully build the context?
“GPS systems are used for hiking.”
• QN: Can we relate “GPS” and “hiking” using a dictionary?
• From Oxford Dictionary:
– “GPS: a navigation system”
– “Hiking: long walk in the countryside taken for pleasure”
– “Walk: place or track or route for foot passengers”
– “Route: course or way taken from starting point to destination”
• But:
– Missing knowledge that hiking involves following/navigating a
particular trail, as opposed to just wandering aimlessly
Finding and Adding Extra, Contextual
Knowledge from the Internet
•
•
WordNet doesn’t contain all the background K
So can we add extra K using other texts too?
– run-time, extra elaboration of current graph
– further expansion of WordNet?
•
Approach:
1.
2.
3.
4.
5.
6.
Start with some initial “seed” text
Retrieve paragraphs containing relevant words
Elaborate their “local and global contexts”
Determine relevance using a similarity measure
Select “the most appropriate new context”
Add its graph (or parts of it?) to the original graph
Finding Relevant Documents
• Two problems:
– Discovery: Which keywords to search with?
• use words in the original seed text, or closely related words
• e.g., “play AND (tennis OR ball OR baseline) AND hit”
– Quality: How relevant are the results?
• measure the degree of overlap of graphs for seed and new texts
• Lexical ambiguity is a root problem
– Disambiguation by assuming new words belong to
same/close synsets as in the original query (dubious!)
A Real Example…
•
•
•
Text: about player who gets tendinis from hitting ball too hard
Build initial graph of sentences (but info missing)
Look for additional information on Internet
1.
2.
3.
try multiple queries
select the best result (= graph most coherent with original text)
layer this graph on top of the original text graph
• Original text + WordNet:
–
–
•
Internet text:
–
•
backline –result  ace
WordNet
–
•
hit –isaaffect  isa- injure –result  injury
hit –purpose  land –location  backline
ace –isa  serve –attr  unreachable –purpose  win
Hence (!)
–
“Winning is the motivation for actions causing tennis injuries”
Summary
•
•
•
•
Interesting, ambitious
Right idea (used by others too)
Didn’t work (?); no further publications on TextNet
Critical details not clear from the paper
– Problem  finding good connections, rather = avoiding
finding bad connections