slide presentation

Download Report

Transcript slide presentation

Machine Reading as a Process of
Partial Question-Answering
Peter Clark and Phil Harrison
Boeing Research & Technology
June 2010
Overview





Machine Reading and Question-Answering
Approach
Algorithm
Preliminary Results
Summary
Machine Reading
 Machine Reading =
 A “holy grail” of AI
 Constructing an inference-supporting representation
from text
 Connecting what is read with what is already known
 Reader already knows something
 Text is elaborating/deepening that knowledge
Machine Reading
Do I already know this?
Can I interpret this as something that I know?
Can I interpret some of this as something I know?
Machine Reading
Question-Answering
Do I already know this?
Can I interpret this as something that I know?
Can I interpret some of this as something I know?
Any remainder = failed query
Machine Reading
Do I already know this?
Can I interpret this as something that I know?
Can I interpret some of this as something I know?
Any remainder = new knowledge
Machine Reading
Question-Answering
Main insight:
These are similar processes
Can apply question-answering
techniques to machine reading.
Machine Reading
Why is that important?
Question-answering is precisely
a technology for linking what is
said (asked) with what is known.
i.e., To read text T
Ask: Is it true that T?
Overview





Machine Reading and Question-Answering
Approach
Algorithm
Preliminary Results
Summary
General Approach
Text:
Question:
Partial Answer:
“The mitotic spindle consists of hollow microtubules.”
“Does the mitotic spindle consist of hollow microtubules?”
“Mitotic spindle has parts [hollow] microtubules”
Knowledge has guided
interpretation
New Knowledge: “Those microtubules are hollow”
General Approach
Text:
Question:
Partial Answer:
“The mitotic spindle consists of hollow microtubules.”
“Does the mitotic spindle consist of hollow microtubules?”
“The mitotic spindle has parts [hollow] microtubules”
New Knowledge: “Those microtubules are hollow”
..and identified the
“anchor points” in the
KB for new knowledge
General Approach
Text:
Question:
Partial Answer:
“The mitotic spindle consists of hollow microtubules.”
“Does the mitotic spindle consists of hollow microtubules?”
“The mitotic spindle has parts [hollow] microtubules”
New Knowledge: “Those microtubules are hollow”
Pipelined (KB independent) NLP
During prophase,
the cell…
Parse, logical form
Word-Sense
Disambiguation
Semantic
Role Labeling
Topic in the KB
?
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Existing Knowledge
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Existing Knowledge
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Existing Knowledge
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Suppose this is the best we can do,
interpreting text as existing knowledge
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Traditional NLP
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
New Knowledge
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Extended KB
Interleaved Interpretation and Answering
During prophase,
the cell…
Topic in the KB
Logical Form
Extended KB
Word sense choices
Semantic role choices
Paraphrase rewrites
Some Possible Semantic Role Labels…
“DNA synthesized by the polymerase”
location?
KB
agent?
means?
Some Possible Paraphrases (DIRT)…
“spindle consists of microtubules”
“spindle is staffed by microtubules”
“microtubules are part of the spindle”
“microtubules participate in the spindle”
KB
…
Overview





Machine Reading and Question-Answering
Approach
Algorithm
Preliminary Results
Summary
Knowledge Representation
 Ontology:
 ~400 biology concepts, ~400 general concepts
 Axioms:
 Mainly “Forall…exists…” axioms, e.g.,
 “All eukaryotic cells contain a nucleus”
 “Subevents of mitosis are prophase, metaphase, …”
 Inference:
 Reason about an instance of a concept
 Conclusions apply to all instances of the concept (via UG)
Topics
Topic: Prophase
The centrosomes are pushed apart to opposite ends of the cell
nucleus by the action of molecular motors acting on the
microtubules. The nuclear envelope breaks downm allowing….
 Topic = the concept that a text describes
 We assume a text is about a single topic
 Topic could be identified using ML (we do it by hand)
 Given topic, can find (some) expected “participants” from KB
Topics
Topic: Prophase
The centrosomes are pushed apart to opposite ends of the cell
nucleus by the action of molecular motors acting on the
microtubules. The nuclear envelope breaks downm allowing….
 Topic = the concept that a text describes
 Participants = Individuals implied to exist given the topic
 Can infer (some) participants using the KB
KB
Prophase
→ centrosome moves to the pole of a eukaryotic cel
→ nucleus, cytoplasm
→ nuclear membrane, etc. etc.
Topics
Topic: Prophase
The centrosomes are pushed apart to opposite ends of the cell
nucleus by the action of molecular motors acting on the
microtubules. The nuclear envelope breaks downm allowing….
 Topic = the concept that a text describes
 Participants = Individuals implied to exist given the topic
 Can infer (some) participants using the KB
KB
Prophase
→ centrosome moves to the pole of a eukaryotic cel
→ nucleus, cytoplasm
→ nuclear membrane, etc. etc.
 Text provides information about participants
Algorithm
Topic: Prophase
“The mitotic spindle consists of hollow microtubules.”
 Identify the topic of the text
 Parse and create initial “logical form”
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m),
subject(c,s), "of"(c,m), modifier(m,h).
1. Setup
Create representation of topic + (known) participants in KB
2. Search:
repeat: interpret + (try to) prove parts of the LF
until: as much proved as possible
Interpret remainder (normal NLP) and add to KB
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
has-part
Y3:Elongate
object
Y4:Mitotic-Spindle
has-part
…
Y6:Create
Y7:Microtubule
…
Create a representation
of the topic in the KB
Y2:Eukaryotic-Cell
has-region
Y5:Pole
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
has-part
Y3:Elongate
object
Y4:Mitotic-Spindle
Y2:Eukaryotic-Cell
has-region
Y5:Pole
has-part
…
Y6:Create
Y7:Microtubule
…
Generate Logical Form
LF interpretation:
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,s), "of"(c,m), mod(m,h)
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
has-part
Y3:Elongate
object
Y4:Mitotic-Spindle
Y2:Eukaryotic-Cell
has-region
Y5:Pole
has-part
…
Y6:Create
Y7:Microtubule
…
LF interpretation:
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,s), "of"(c,m), mod(m,h)
Interpret and (try) prove some part of the LF
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
has-part
Y3:Elongate
object
Y4:Mitotic-Spindle
Y2:Eukaryotic-Cell
has-region
Y5:Pole
has-part
…
Y6:Create
Y7:Microtubule
…
LF interpretation:
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,s), "of"(c,m), mod(m,h
isa(Y4,MSpindle), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,Y4),"of"(c,m),mod(m,h)
Bind a LF variable
Interpret and (try) prove some part of the LF
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
Y2:Eukaryotic-Cell
has-part
Y3:Elongate
object
Y4:Mitotic-Spindle
has-region
Y5:Pole
has-part
…
Y6:Create
Y7:Microtubule
…
LF interpretation:
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,s), "of"(c,m), mod(m,h)
isa(Y4,MSpindle), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,Y4),"of"(c,m),mod(m,h
isa(Y4,MSpindle), "hollow"(h), "microtubule"(m), material(Y4,m), mod(m,h).
?
Interpret and (try) prove some part of the LF
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
Y2:Eukaryotic-Cell
has-part
Y3:Elongate
object
Y4:Mitotic-Spindle
has-region
Y5:Pole
has-part
…
Y6:Create
Y7:Microtubule
…
LF interpretation:
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,s), "of"(c,m), mod(m,h)
isa(Y4,MSpindle), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,Y4),"of"(c,m),mod(m,h
isa(Y4,MSpindle), "hollow"(h), "microtubule"(m), has-part(Y4,m), mod(m,h).
?
Interpret and (try) prove some part of the LF
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
Y2:Eukaryotic-Cell
has-part
Y3:Elongate
object
Y4:Mitotic-Spindle
has-region
Y5:Pole
has-part
…
Y6:Create
Y7:Microtubule
…
Recognized
Old Knowledge
LF interpretation:
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,s), "of"(c,m), mod(m,h)
isa(Y4,MSpindle), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,Y4),"of"(c,m),mod(m,h)
isa(Y4,MSpindle), "hollow"(h), "microtubule"(m), has-part(Y4,m), mod(m,h).
Interpret and (try) prove some part of the LF
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
Y2:Eukaryotic-Cell
has-part
Y3:Elongate
object
Y4:Mitotic-Spindle
has-region
Y5:Pole
has-part
…
Y6:Create
Y7:Microtubule
…
Recognized
Old Knowledge
LF interpretation:
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,s), "of"(c,m), mod(m,h)
isa(Y4,MSpindle), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,Y4),"of"(c,m),mod(m,h)
isa(Y4,MSpindle), "hollow"(h), "microtubule"(m), has-part(Y4,m), mod(m,h).
isa(Y4,MSpindle), "hollow"(h), isa(Y7,Microtubule), has-part(Y4,Y7), modifier(Y7,h).
!
Interpret and (try) prove some part of the LF
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
Y2:Eukaryotic-Cell
has-part
Y3:Elongate
object
Y4:Mitotic-Spindle
has-region
Y5:Pole
has-part
…
Y6:Create
Y7:Microtubule
…
LF interpretation:
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,s), "of"(c,m), mod(m,h)
isa(Y4,MSpindle), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,Y4),"of"(c,m),mod(m,h)
isa(Y4,MSpindle), "hollow"(h), "microtubule"(m), has-part(Y4,m), mod(m,h).
isa(Y4,MSpindle), "hollow"(h), isa(Y7,Microtubule), has-part(Y4,Y7), modifier(Y7,h).
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
has-part
Y3:Elongate
object
Y4:Mitotic-Spindle
Y2:Eukaryotic-Cell
has-region
Y5:Pole
has-part
…
Y6:Create
Y7:Microtubule
…
LF interpretation:
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,s), "of"(c,m), mod(m,h)
isa(Y4,MSpindle), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,Y4),"of"(c,m),mod(m,h)
Traditional NLP for the
rest…
isa(Y4,MSpindle), "hollow"(h), "microtubule"(m), has-part(Y4,m), mod(m,h).
isa(Y4,MSpindle), "hollow"(h), isa(Y7,Microtubule), has-part(Y4,Y7), modifier(Y7,h).
isa(Y4,MSpindle), isa(Y8,Hollow), isa(Y7,Microtubule), has-part(Y4,Y7), shape(Y7,Y8).
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
has-part
Y3:Elongate
object
Y6:Create
has-region
Y4:Mitotic-Spindle
has-part
…
Y2:Eukaryotic-Cell
Y7:Microtubule
Y5:Pole
shape
Y8:Hollow
…
New Knowledge
LF interpretation:
"mitotic-spindle"(s), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,s), "of"(c,m), mod(m,h)
Add to the KB
isa(Y4,MSpindle), "consist"(c), "hollow"(h), "microtubule"(m), subject(c,Y4),"of"(c,m),mod(m,h)
isa(Y4,MSpindle), "hollow"(h), "microtubule"(m), has-part(Y4,m), mod(m,h).
isa(Y4,MSpindle), "hollow"(h), isa(Y7,Microtubule), has-part(Y4,Y7), modifier(Y7,h).
isa(Y4,MSpindle), isa(Y8,Hollow), isa(Y7,Microtubule), has-part(Y4,Y7), shape(Y7,Y8).
“The mitotic spindle consists of hollow microtubules.”
destination
subevent
X0:Prophase
Y0:Move
object
Y1:Centrosome
has-part
Y3:Elongate
object
Y6:Create
has-region
Y4:Mitotic-Spindle
has-part
…
Y2:Eukaryotic-Cell
Y7:Microtubule
Y5:Pole
shape
Y8:Hollow
…
New Knowledge
Overview





Machine Reading and Question-Answering
Approach
Algorithm
Illustration and Preliminary Results
Summary
Illustration
Input Text + Topic (here, Prophase):
“During prophase, chromosomes become visible, the nucleolus disappears, the
mitotic spindle forms, and the nuclear envelope disappears. Chromosomes become
more coiled and can be viewed under a light microscope. Each duplicated
chromosome is seen as a pair of sister chromatids joined by the duplicated but
unseparated centromere. The nucleolus disappears during prophase. In the
cytoplasm, the mitotic spindle, consisting of microtubules and other proteins, forms
between the two pairs of centrioles as they migrate to opposite poles of the cell. The
nuclear envelope disappears at the end of prophase. This signals the beginning of
the substage called prometaphase.”
Output Axioms (expressed in English):
In all prophase events:
• The chromosome moves.
• The chromatids are attached by the centromere.
• The nucleolus disappears during the prophase.
• The mitotic spindle has parts the microtubule and the protein.
• The mitotic spindle is created between the centrioles in the cytoplasm.
• The centrioles move to the poles.
• The nuclear envelope disappears at the end.
• Something signals.
Illustration
Input Text:
“During prophase, chromosomes become visible, the nucleolus disappears, the
mitotic spindle forms, and the nuclear envelope disappears. Chromosomes become
more coiled and can be viewed under a light microscope. Each duplicated
chromosome is seen as a pair of sister chromatids joined by the duplicated but
unseparated centromere. The nucleolus disappears during prophase. In the
cytoplasm, the mitotic spindle, consisting of microtubules and other proteins,
forms between the two pairs of centrioles as they migrate to opposite poles of the
cell. The nuclear envelope disappears at the end of prophase. This signals the
beginning of the substage called prometaphase.”
Output Axioms (expressed in English):
Good interpretation
using paraphrases
In all prophase events:
• The chromosome moves.
• The chromatids are attached by the centromere.
• The nucleolus disappears during the prophase.
• The mitotic spindle has parts the microtubule and the protein.
• The mitotic spindle is created between the centrioles in the cytoplasm.
• The centrioles move to the poles.
• The nuclear envelope disappears at the end.
• Something signals.

Illustration
Input Text:
“During prophase, chromosomes become visible, the nucleolus disappears, the
mitotic spindle forms, and the nuclear envelope disappears. Chromosomes become
more coiled and can be viewed under a light microscope. Each duplicated
chromosome is seen as a pair of sister chromatids joined by the duplicated but
unseparated centromere. The nucleolus disappears during prophase. In the
cytoplasm, the mitotic spindle, consisting of microtubules and other proteins,
forms between the two pairs of centrioles as they migrate to opposite poles of the
cell. The nuclear envelope disappears at the end of prophase. This signals the
beginning of the substage called prometaphase.”
Output Axioms (expressed in English):
In all prophase events:
Useful New Knowledge
• The chromosome moves.
• The chromatids are attached by the centromere.
• The nucleolus disappears during the prophase.
• The mitotic spindle has parts the microtubule and the protein.
• The mitotic spindle is created between the centrioles in the cytoplasm.
• The centrioles move to the poles.
• The nuclear envelope disappears at the end.
• Something signals.

Illustration
Input Text:
“During prophase, chromosomes become visible, the nucleolus disappears, the
mitotic spindle forms, and the nuclear envelope disappears. Chromosomes become
more coiled and can be viewed under a light microscope. Each duplicated
chromosome is seen as a pair of sister chromatids joined by the duplicated but
unseparated centromere. The nucleolus disappears during prophase. In the
cytoplasm, the mitotic spindle, consisting of microtubules and other proteins, forms
between the two pairs of centrioles as they migrate to opposite poles of the cell. The
nuclear envelope disappears at the end of prophase. This signals the beginning of
the substage called prometaphase.”
Output Axioms (expressed in English):
In all prophase events:
Good interpretation
• The chromosome moves.
• The chromatids are attached by the centromere.
• The nucleolus disappears during the prophase.
• The mitotic spindle has parts the microtubule and the protein.
• The mitotic spindle is created between the centrioles in the cytoplasm.
• The centrioles move to the poles.
• The nuclear envelope disappears at the end.
• Something signals.

Illustration
Input Text:
“During prophase, chromosomes become visible, the nucleolus disappears, the
mitotic spindle forms, and the nuclear envelope disappears. Chromosomes become
more coiled and can be viewed under a light microscope. Each duplicated
chromosome is seen as a pair of sister chromatids joined by the duplicated but
unseparated centromere. The nucleolus disappears during prophase. In the
cytoplasm, the mitotic spindle, consisting of microtubules and other proteins, forms
between the two pairs of centrioles as they migrate to opposite poles of the cell. The
nuclear envelope disappears at the end of prophase. This signals the beginning of
the substage called prometaphase.”
Output Axioms (expressed in English):
Not very useful
In all prophase events:
• The chromosome moves.
• The chromatids are attached by the centromere.
• The nucleolus disappears during the prophase.
• The mitotic spindle has parts the microtubule and the protein.
• The mitotic spindle is created between the centrioles in the cytoplasm.
• The centrioles move to the poles.
• The nuclear envelope disappears at the end.
• Something signals.

Illustration
Input Text:
“During prophase, chromosomes become visible, the nucleolus disappears, the
mitotic spindle forms, and the nuclear envelope disappears. Chromosomes become
more coiled and can be viewed under a light microscope. Each duplicated
chromosome is seen as a pair of sister chromatids joined by the duplicated but
unseparated centromere. The nucleolus disappears during prophase. In the
cytoplasm, the mitotic spindle, consisting of microtubules and other proteins, forms
between the two pairs of centrioles as they migrate to opposite poles of the cell. The
nuclear envelope disappears at the end of prophase. This signals the beginning of
the substage called prometaphase.”
Output Axioms (expressed in English):
In all prophase events:
Bad interpretation
• The chromosome moves.
• The chromatids are attached by the centromere.
• The nucleolus disappears during the prophase.
• The mitotic spindle has parts the microtubule and the protein.
• The mitotic spindle is created between the centrioles in the cytoplasm.
• The centrioles move to the poles.
• The nuclear envelope disappears at the end.
• Something signals.

A Preliminary Experiment
 10 paragraphs (110 sentences) about prophase, from Web
  114 logic statements created
 23 (20%) fully known to the KB
 27 (24%) partially new knowledge
 64 (56%) completely new knowledge
 Biologist ranked the statements (expressed in English) as:
 c = correct; useful knowledge for the KB

 q = questionable; not useful (meaningless, vague)

 i = incorrect

A Preliminary Experiment
Statements that are:
Correct
Questionable
Incorrect



Fully
known
Mixture of
known & new
Fully
new
22
19
25
1
8
38
0
0
1
“The membrane break down”
• Questionable due to poor rendering in English, not
the original logic
A Preliminary Experiment
Statements that are:
Correct
Questionable
Incorrect



Fully
known
Mixture of
known & new
Fully
new
22
19
25
1
8
38
0
0
1
70% judged correct
A Preliminary Experiment
Statements that are:
Correct
Questionable
Incorrect



Fully
known
Mixture of
known & new
Fully
new
22
19
25
1
8
38
0
0
1
39% judged correct
A Preliminary Experiment
Statements that are:
Correct
Questionable
Incorrect



Fully
known
Mixture of
known & new
Fully
new
22
19
25
1
8
38
0
0
1
 Is extracting and integrating some useful knowledge
 Potentially useful as interactive tool
Summary
To read T,
ask “Is it true that T?”
 Clearly only a first step
 Simple KR, single parse, contradictions, noisy, …
 But:
 Interpretation guided by knowledge
 Identifies the “hooks” for new knowledge
 Is a “real” context for machine reading