Transcript Slides

Learning to “Read Between the Lines”
using Bayesian Logic Programs
Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku
The University of Texas at Austin
July 2012
1
Information Extraction
• Information extraction (IE) systems extract factual
information that occurs in text [Cowie and Lenhert, 1996;
Sarawagi, 2008]
• Natural language text is typically “incomplete”
– Commonsense information is not explicitly stated
– Easily inferred facts are omitted from the text
• Human readers use commonsense knowledge and
“read between the lines” to infer implicit information
• IE systems have no access to commonsense
knowledge and hence cannot infer implicit information
2
Example
Natural language text
“Barack Obama is the President of the
United States of America.”
Query
“Barack Obama is the citizen of what
country?”
IE systems cannot answer this query since
citizenship information is not explicitly stated!
3
Objective
• Infer implicit facts from explicitly stated
information
– Extract explicitly stated facts using an IE system
– Learn common sense knowledge in the form of
logical rules to deduce additional facts
– Employ models from statistical relational
learning (SRL) that allow probabilities to be
estimated using well-founded probabilistic
graphical models
4
Related Work
• Learning propositional rules [Nahm and Mooney,
2000]
– Learn propositional rules from the output of an
IE system on computer-related job postings
– Perform logical deduction to infer new facts
– Purely logical deduction is brittle
• Cannot assign probabilities or confidence
estimates to inferences
5
Related Work
• Learning first-order rules
– Logical deduction using probabilistic rules [Carlson
et al., 2010; Doppa et al., 2010]
• Modify existing rule learners like FOIL and FARMER
to learn probabilistic rules
• Probabilities are not computed using well-founded
probabilistic graphical models
– Use Markov Logic Networks (MLNs) [Domingos and
Lowd, 2009] based approaches to infer additional
facts [Schoenmackers et al., 2010; Sorower et al., 2011]
• Grounding process could result in intractably large
networks for large domains
6
Related Work
• Learning for Textual Entailment [Lin and Pantel, 2001;
Yates and Etzioni, 2007; Berant et al., 2011]
– Textual entailment rules have a single antecedent
in the body of the rule
– Approaches from statistical relational learning
have not been applied so far
– Do not use extractions from a traditional IE system
to learn rules
7
Our Approach
• Use an off-the shelf IE system to extract facts
• Learn commonsense knowledge from the
extracted facts in the form of probabilistic
first-order-rules
• Infer additional facts based on the learned
rules using Bayesian Logic Programs
(BLPs) [Kersting and De Raedt, 2001]
8
.
Barack
. .Obama is the
current. President of
. Obama was
USA…….
. August 4,
born on
System Architecture
1961, in Hawaii, USA.
Training
Documents
BLP Weight Learner
(version of EM)
.
.
nationState(USA)
.
Person(BarackObama)
.
isLedBy(USA,BarackObama)
.
hasBirthPlace(BarackObama,USA)
.
hasCitizenship(BarackObama,USA)
Information Extractor
Extracted
(IBM SIRE)
Facts
First-Order
Logical Rules
Inductive Logic
Programming
(LIME)
nationState(B) ∧ isLedBy(B,A)  hasCitizenship(A,B)
nationState(B) ∧ employs(B,A)  hasCitizenship(A,B)
Bayesian Logic
Program (BLP)
BLP Inference
Engine
hasCitizenship(A,B) | nationState(B) , isLedBy(B,A) .9
hasCitizenship(A,B) | nationState(B) , employs(B,A) .6
hasCitizenship(mahathir-mohamad, malaysian)
0.75
Test
Document
Extractions
nationState(malaysian)
Person(mahathir-mohamad)
isLedBy(malaysian,mahathir-mohamad)
employs(malaysian,mahatir-mohamad)
Inferences with
probabilities
9
Bayesian Logic Programs
[Kersting and De Raedt, 2001]
• Set of Bayesian clauses a | a1,a2,....,an
–
–
–
–
Definite clauses in first-order logic, universally quantified
Head of the clause - a
Body of the clause - a1, a2, …, an
Associated conditional probability table (CPT)
• P(head | body)
• Bayesian predicates a, a1, a2, …, an have finite
domains
– Combining rule like noisy-or for mapping multiple CPTs
into a single CPT
• Given a set of Bayesian clauses and a query, SLD
resolution is used to construct ground Bayesian
networks for probabilistic inference
10
Why BLPs?
• Pure logical deduction is brittle and results in
many undifferentiated inferences
• Inference in BLPs is probabilistic, i.e.
inferences are assigned probabilities
– Probabilities can be used to select only highconfidence inferences
• Efficient grounding mechanism in BLPs
enables our approach to scale
11
Inductive Logic Programming (ILP)
for learning first-order rules
Positive instances
Target relation
hasCitizenship
(BarackObama, USA)
hasCitizenship(X,Y)
hasCitizenship
(GeorgeBush, USA)
Rules
hasCitizenship
(IndiraGandhi,India)
.
.
ILP
Rule Learner
Negative instances
hasCitizenship
(BarackObama, India)
hasCitizenship
(GeorgeBush, India)
hasCitizenship
(IndiraGandhi,USA)
.
.
nationState(Y) ∧
isLedBy(Y,X) 
hasCitizenship
(X,Y)
.
.
KB
hasBirthPlace(BarackObama,USA)
person(BarackObama)
nationState(USA)
nationState(India)
.
.
12
Inference using BLPs
Test document
“Malaysian Prime Minister Mahathir Mohamad Wednesday
announced for the first time that he has appointed his deputy
Abdullah Ahmad Badawi as his successor.”
Extracted facts
nationState(malaysian)
Person(mahathir-mohamad)
isLedBy(malaysian,mahathir-mohamad)
employs(malaysian,mahatir-mohamad)
Learned rules
nationState(B) ∧ isLedBy(B,A)  hasCitizenship(A,B)
nationState(B) ∧ employs(B,A)  hasCitizenship(A,B)
13
Logical Inference in BLPs
Rule 1
nationState(B) ∧ isLedBy(B,A)  hasCitizenship(A,B)
nationState(malaysian)
isLedBy(malaysian,mahathir-mohamad)
hasCitizenship(mahathir-mohamad, malaysian)
14
Logical Inference in BLPs
Rule 2
nationState(B) ∧ employs(B,A)  hasCitizenship(A,B)
nationState(malaysian)
employs(malaysian,mahathir-mohamad)
hasCitizenship(mahathir-mohamad, malaysian)
15
Probabilistic inference in BLPs
employs
isLedBy
nationState
(malaysian,
mahathirmohamad)
(malaysian)
(malaysian,
mahathirmohamad)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
dummy1
-
-
-
-
-
-
-
-
-
-
-
-
dummy2
hasCitizenship
(mahathir-mohamad,
malaysian)
16
Sample rules learned
governmentOrganization(A) ∧ employs(A,B)
 hasMember(A,B)
eventLocation(A,B) ∧ bombing(A)
 thingPhysicallyDamage(A,B)
isLedBy(A,B)
 hasMemberPerson(A,B)
17
Experimental Evaluation
• Data
– DARPA’s intelligence community (IC) data
set from the Machine Reading Project (MRP)
– Consists of news articles on politics, terrorism,
and other international events
– 10,000 documents in total
• Perform 10-fold cross validation
18
Experimental Evaluation
• Learning first-order rules using LIME [McCreath
and Sharma, 1998]
– Learn rules for 13 target relations
– Learn rules using both positive and negative
instances and using only positive instances
– Include all unique rules learned from different
models
• Learning BLP parameters
– Learn noisy-or parameters using Expectation
Maximization (EM)
– Set priors to maximum likelihood estimates
19
Experimental Evaluation
• Performance evaluation
– Manually evaluated inferred facts from 40
documents, randomly selected from each test set
– Compute two precision scores
• Unadjusted (UA) – does not account for extractor’s
mistakes
• Adjusted (AD) – account for extractor’s mistakes
– Rank inferences using marginal probabilities and
evaluate top-n
20
Experimental Evaluation
• Systems compared
– BLP Learned Weights
• Noisy-or parameters learned using online EM
– BLP Manual Weights
• Noisy-or parameters set to 0.9
– Logical Deduction
– MLN Learned Weights
• Learn weights using generative online weight learner
– MLN Manual Weights
• Assign a weight of 10 to all rules and MLE priors to all
predicates
21
Unadjusted Precision
1
BLP Manual Weights
BLP Learned Weights
MLN Manual Weights
MLN Learned Weights
Logical Deduction
0.9
Unadjusted Precision
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
100
200
300
400
500
600
700
800
900
1000
Top−n inferences
22
Adjusted Precision
1
BLP Manual Weights
BLP Learned Weights
MLN Manual Weights
MLN Learned Weights
Logical Deduction
0.9
Adjusted Precision
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
100
200
300
400
500
600
700
800
900
1000
Top−n inferences
23
Future Work
• Improve the performance of weight learning for
BLPs and MLNs
– Learn parameters on larger data sets
• Improve performance of MLNs
– Use open-world assumption for learning
– Add constraints required to prevent inference of facts
like employs(a,a)
– Specialize types that do not have strictly defined types
• Develop an online rule learner that can learn
rules from uncertain training data
24
Conclusions
• Efficient learning of probabilistic first-order
rules that represent common sense
knowledge using extractions from an IE
system
• Inference of implicitly stated facts with high
precision using BLPs
• Superior performance of BLPs over purely
logical deduction and MLNs
25
Questions??
26
Back Up
27
Results for Logical Deduction
Precision
UA
AD
29.73
(443/1490)
35.24
(443/1257)
28
Experimental Evaluation
• Learning BLP parameters
– Use logical-and model to combine evidence
from the conjuncts in the body of the clause
– Use noisy-or model to combine evidence from
several ground rules that have the same head
– Learn noisy-or parameters using Expectation
Maximization (EM)
– Set priors to maximum likelihood estimates
29