slide presentation

Download Report

Transcript slide presentation

Machine Reading from Multiple Texts
Peter Clark and John Thompson
Boeing Research and Technology
What is Machine Reading?
 Not (just) parsing, fact extraction
 Construction of a coherent representation of the
scene the text describes
 Challenge: much of that representation is not in
the text
“A soldier
was killed in
a gun battle”
The soldier died
The soldier was shot
There was a fight
…
What we are trying to do:
 Multiple text approach:
 Reduce need for precision/coverage on individual texts
 Assess confidence using redundancy
 Exploit the vast amount of text available
 Domains: 2 stroke engines, Pearl Harbor
What we’re trying to do: 2 Stroke engines
Multiple Input Texts
...the mixture of fuel and air in the
cylinder has been compressed. This
mixture ignites when the spark plug
generates a spark. Igniting the mixture
causes an explosion. The explosion
forces the piston down....
Output: Single, Coherent
Representation
Compress mixture
Suck in fresh mixture
Generate spark
...The piston compresses the air-fuel
mixture in the combustion chamber.
The vacuum in the crankcase sucks a
fresh mixture of air-fuel-oil into the
cylinder. The spark from the spark
plug begins the combustion stroke…
Ignite mixture
Mixture explodes
What we’re trying to do: Pearl Harbor
Multiple Input Texts
…at 6am, the first attack wave of 183
Japanese planes takes off from the carriers
and heads for Pearl Harbor. At 7:53am the
first Japanese assault wave commences the
attack, targeting airfields and battleships.
Eight battleships are damaged, with five
sunk…
Output: Single, Coherent
Representation
Japanese planes take off
Planes fly to Pearl Harbor
Planes bomb airfields & ships
As the sun was just beginning to rise, a fleet
of Japanese forces were taking off from
carriers in various locations in the Pacific. At
7:55am, just as many islanders were
waking up for breakfast, the first Japanese
bomb was dropped on Wheeler Field,
eight miles from Pearl Harbor….Most
planes returned to their carriers intact…
Eight battleships damaged
Planes return
Incredibly Challenging
 Basic language processing is hard
→ Need high-quality language engine
 Multiple alignments and implications of text
→ Treat reading as model building, not fact extraction
 Multiple viewpoints/perspectives
 → Knowledge-guided model extraction process
Basic language processing is hard
 Usual suspects: syntax, WSD, SRL, LF, NE, …
 Discourse structure contains much implicit
knowledge (e.g., parts, event ordering)
A two-stroke engine's combustion stroke occurs when the spark plug
fires. At the beginning of the combustion stroke, the mixture of fuel
and air in the cylinder has been compressed. This mixture ignites
when the spark plug generates a spark. Igniting the mixture causes an
explosion. The explosion forces the piston down. The piston
compresses the mixture in the crankcase as it moves down. As the
piston approaches the bottom of its stroke, the exhaust port is
uncovered. The pressure in the cylinder forces exhaust gases out of
the cylinder. As the piston reaches the bottom of the cylinder, the
intake port is uncovered. The piston's movement pressurizes the
mixture in the crankcase. The mixture displaces the burned gases in
the cylinder.
Incredibly Challenging
 Basic language processing is hard
→ Need high-quality language engine
 Multiple alignments and implications of text
→ Treat reading as model building, not fact extraction
 Multiple viewpoints/perspectives
 → Knowledge-guided model extraction process
Want:
Finding Equivalences, Entailments, and Matches
...The piston compresses the air-fuel
mixture in the combustion chamber.
The vacuum in the crankcase sucks a
fresh mixture of air-fuel-oil into the
cylinder. The spark from the spark plug
begins the combustion stroke…
...the mixture of fuel and air in the
cylinder is compressed. This mixture
ignites when the spark plug generates a
spark. Igniting the mixture causes an
explosion….
 Basic operation: relating (then integrating) texts
Finding Equivalences, Entailments, and Matches
T
...The piston compresses the air-fuel
mixture in the combustion chamber.
The vacuum in the crankcase sucks a
fresh mixture of air-fuel-oil into the
cylinder. The spark from the spark plug
begins the combustion stroke…
...the mixture of fuel and air in the
cylinder is compressed. This mixture
ignites when the spark plug generates a
spark. Igniting the mixture causes an
explosion….
Finding Equivalences, Entailments, and Matches
T
=?
→?
←?
?
H
...The piston compresses the air-fuel
mixture in the combustion chamber.
The vacuum in the crankcase sucks a
fresh mixture of air-fuel-oil into the
cylinder. The spark from the spark plug
begins the combustion stroke…
...the mixture of fuel and air in the
cylinder is compressed. This mixture
ignites when the spark plug generates a
spark. Igniting the mixture causes an
explosion….
Textual “Entailment” = The “Modus Ponens” of NLU
Recognizing Textual Entailment (RTE)
T: The piston's movement pressurizes the mixture.
H: The piston compresses the mixture.
 Task: does H “reasonably” follow from T?
 (or: what is the relationship between T and H?)
 Annual RTE competition for 4 years
 Is very difficult, and largely unsolved still
 typical scores ~50%-70% (baseline is 50%)
 RTE4 (2008): Mean score was 57.5%
Examples
A few are easy(ish)….
T: The piston's movement pressurizes the mixture.
H: The piston compresses the mixture.
but most are difficult…
T: A 1,760 pound armor-piercing shell slammed through the deck
and hit the ship’s forward ammunition magazine.
H: A 1,760 pound bomb penetrated into the front of the ship.
Boeing’s RTE System
1. Interpret texts using BLUE

(Boeing Language Understanding Engine)
2. See if:

H subsumes (is implied by) T


H:“An animal eats a mouse” ← T:“A black cat eats a mouse”
H subsumes an elaboration of T


H:“An animal digests a mouse” ← T:“A black cat eats a mouse”
via IF X eats Y THEN X digests Y
Two sources of World Knowledge


WordNet subsumption and part of speech relations
DIRT paraphrases
BLUE’s Pipeline
“Igniting the mixture causes an explosion.”
(DECL ((VAR _X1 "the" "mixture")
(VAR _X2 NIL (S (ING) NIL "ignite" _X1))
(VAR _X3 "an" "explosion"))
(S (PRESENT) _X2 "cause" _X3))
"mixture"(mixture01),
"ignite"(ignite01),
sobject(ignite01,mixture01),
"explosion"(explosion01),
"cause"(cause01),
subject(cause01,ignite01),
sobject(cause01,explosion01).
isa(mixture01,mixture_n1),
isa(ignite01,light_v4),
isa(explosion01,explosion_n1),
causes(ignite01,explosion01),
object(ignite01,mixture01).
Parse +
Logical
form
Initial
Logic
Final
Logic
“Lexico-semantic inference”
 Subsumption
T: A black cat ate a mouse
subject(eat01,cat01), object(eat01,mouse01), mod(cat01,black01)
“by”(eat01,animal01), object(eat01,mouse01)
H: A mouse was eaten by an animal
With Inference…
T: A black cat ate a mouse
IF X isa cat THEN X has a tail
IF X eats Y THEN X digests Y
T’: A black cat ate a mouse. The cat has a tail.
The cat digests the mouse. The cat chewed the
mouse. The cat is furry. ….
With Inference…
T: A black cat ate a mouse
IF X isa cat THEN X has a tail
IF X eats Y THEN X digests Y
T’: A black cat ate a mouse. The cat has a tail.
The cat digests the mouse. The cat chewed the
mouse. The cat is furry. ….
Subsumes
H: An animal digested the mouse.
Acquiring paraphrase/inference rules
 Where do the rules come from?
IF X loves Y THEN X likes Y
 paraphrasing technology can learn these, e.g., DIRT
X
freq
Y
X loves Y
X
?
word
Y
freq
X falls to Y
freq
table
chair
bed
cat
dog
Fred
Sue
person
table
chair
bed
cat
dog
Fred
Sue
person
word
table
chair
bed
cat
dog
Fred
Sue
person
table
chair
bed
cat
dog
Fred
Sue
person
word
freq
word
Acquiring paraphrase/inference rules
 Where do the rules come from?
IF X loves Y THEN X likes Y
 paraphrasing technology can learn these, e.g., DIRT
X
freq
Y
X loves Y
freq
table
chair
bed
cat
dog
Fred
Sue
person
table
chair
bed
cat
dog
Fred
Sue
person
word
word
X
Y
freq
X falls to Y
table
chair
bed
cat
dog
Fred
Sue
person
table
chair
bed
cat
dog
Fred
Sue
person
word
freq
word
Acquiring paraphrase/inference rules
 Where do the rules come from?
IF X loves Y THEN X likes Y
 paraphrasing technology can learn these, e.g., DIRT
X
freq
Y
X loves Y
table
chair
bed
cat
dog
Fred
Sue
person
table
chair
bed
cat
dog
Fred
Sue
person
word
freq
?
word
X
freq
Y
X likes Y
freq
table
chair
bed
cat
dog
Fred
Sue
person
table
chair
bed
cat
dog
Fred
Sue
person
Acquiring paraphrase/inference rules
 Where do the rules come from?
IF X loves Y THEN X likes Y
 paraphrasing technology can learn these, e.g., DIRT
X
freq
Y
X loves Y
freq
table
chair
bed
cat
dog
Fred
Sue
person
table
chair
bed
cat
dog
Fred
Sue
person
word
word
X
freq
Y
X likes Y
freq
table
chair
bed
cat
dog
Fred
Sue
person
table
chair
bed
cat
dog
Fred
Sue
person
Some selected paraphrases from DIRT
IF Sergei organizes a symposium THEN:
Sergei promotes a symposium.
Sergei participates in a symposium.
Sergei makes preparations for a symposium.

Sergei intensifies a symposium.
Sergei denounces a symposium.
Sergei urges a boycott of a symposium.

Good Entailments and Alignments
...The pressure in the cylinder
displaces the burned gases from
cylinder….
(DIRT) IF Y is displaced from X THEN Y pours out of X
the burned gases pour out of the cylinder
(WordNet)
…Burned gases flow out of the cylinder
through the exhaust port….
Good Entailments and Alignments
…The piston’s movement
pressurizes the mixture in the
crankcase….
(DIRT) IF X’s movement changes Y THEN X changes Y
the piston pressurizes the mixture
(WordNet)
...The piston compresses the mixture in
the crankcase….
Bad Entailments
...The burned air-fuel mixture exits the
cylinder through the exhaust port…
(DIRT) IF X exits Y THEN X squeezes into Y

the mixture squeezes into the cylinder
(WordNet)
The air-fuel mixture goes into the cylinder
as the piston moves….
Other entailments
... Following the explosion, the exploding gases
push the piston, forcing it down the cylinder…
the gases drive the piston 
the piston is moved down by the gases 
the gases pull the piston 
the piston militates against the gases 
…….
The Bottom Line
 Simply finding local alignments, and computing
local implications, is not enough
 Machine-learned world knowledge is too noisy
 Local decisions are unacceptably error-prone
 Reading is not (just) a set of local processes
 Rather: Also need a “global” aspect:
Machine Reading = a process of model formation
 a search for a “most coherent” set of facts
The exploding
gases push the
piston down the
cylinder…
The gases pull
the piston.
The gases push
the piston down.
The gases propel
the piston.
The gases are
moved by the piston.
The explosion of
the gases drive the
piston…
The gases push
the piston.
The gases drive
the piston.
Text Interpretation
The gases race
the piston.
Entailments
The exploding
gases push the
piston down the
cylinder…
The gases pull
the piston.
The gases push
the piston down.
The gases propel
the piston.
The gases are
moved by the piston.
The explosion of
the gases drive the
piston…
The gases push
the piston.
The gases drive
the piston.
Text Interpretation
The gases race
the piston.
Entailments
The gases pull
the piston.
The exploding
gases push the
piston down the
cylinder…
The gases push
the piston down.
The gases propel
the piston.
The gases are
moved by the piston.
The explosion of
the gases drive the
piston…
The gases push
the piston.
The gases drive
the piston.
The gases race
the piston.
Best, consistent subset of elaborations
= Overall, integrated theory
Is a Markov-based search process:
 Can transform this to a satisfiability problem…
 Maximize (weighted) number of happy (satisfied) formulae!
Propositions:
P1: gases push piston down
P2: gases drive piston
P3: gases pull piston
P4: gases propel piston
“Things we’d like to be true”
Weights: Formulae:
Given fact →
∞
P1
∞
P2
DIRT rule →
10
P1 → P3
8
P1 → P4
10
P2 → P4
Inconsistent → ∞
not P1 & P3
facts can’t
both hold
Is a Markov-based search process:
 Can transform this to a satisfiability problem…
 Maximize (weighted) number of happy (satisfied) formulae!
Propositions:
Best assignment:
P1: gases push piston down
t
P2: gases drive piston
t
P3: gases pull piston
f
P4: gases propel piston
t
“Things we’d like to be true”
Weights: Formulae:
Results in:
Given fact →
∞
P1
t
∞
P2
t
DIRT rule →
10
P1 → P3
f
8
P1 → P4
t
10
P2 → P4
t
Inconsistent → ∞
not P1 & P3
t
facts can’t
both hold
Incredibly Challenging
 Basic language processing is hard
→ Need high-quality language engine
 Many possible equivalences and implications
→ Treat reading as model building, not fact extraction
 Multiple viewpoints/perspectives
 → Knowledge-guided model extraction process
Want:
Got:
Fuchida shouted “Tora! Tora!”
The ships reached position.
It was a sunny day.
The attack was audacious.
 Do have coherent, supported facts
 BUT:
 There’s a lot going on in any scene!
 Multiple viewpoints and levels of detail
Expectations/
Scripts
 Better: Use world knowledge to guide
what to look for
 e.g., scripts of generalized event
sequences
Expectations/
Scripts
(Entailment-like reasoning again!)
System can still make mistakes…
….Japanese submarines
attacked Pearl Harbor…
….torpedoes attacked
Pearl Harbor…
….bombers attacked
Pearl Harbor…
WordNet
sandwich#n2:
“submarine” “hoagie” “torpedo”
“sandwich” “poor boy”,
“bomber”: a large sandwich made
with meat and cheese
Pearl Harbor is being
attacked by
sandwiches (!)
Summary
 Machine Reading from multiple texts
 tolerate gaps, ambiguity, errors through redundancy
 Three critical requirements
 High-quality language engine
 Reading as model building, not fact extraction
 entailment technology as “modus ponens” of NLU
 search for coherence to overcome (many) local errors
 Knowledge-guided model extraction process
 expectations to guide what to look for
 Implications of success are huge!