Transcript Slides

Richard
Montague
Andrey
Markov
Montague Meets Markov:
Deep Semantics with Probabilistic Logical Form
Islam Beltagy, Cuong Chau, Gemma Boleda,
Dan Garrette, Katrin Erk, Raymond Mooney
The University of Texas at Austin
Semantic Representations
• Formal Semantics
– Uses first-order logic
– Deep
– Brittle
• Distributional Semantics
– Statistical method
– Robust
– Shallow
• Goal: combine advantages of both logical
and distributional semantics in one
framework
2
Semantic Representations
• Combining both logical and distributional
semantics
– Represent meaning using a probabilistic logic (in
contrast with standard first-order logic)
• Markov Logic Network (MLN)
– Generate soft inference rules
• From distributional semantics
x hamster(x) → gerbil(x) | f(w)
3
Agenda
•
•
•
•
•
Introduction
Background: MLN
RTE
STS
Future work and Conclusion
4
Agenda
•
•
•
•
•
Introduction
Background: MLN
RTE
STS
Future work and Conclusion
5
Markov Logic Networks
[Richardson & Domingos, 2006]
• MLN: Soft FOL
– Weighted rules
1.5
1.1
∀ x Smokes( x)⇒Cancer( x)
∀ x,y Friends ( x,y)⇒(Smokes( x )⇔Smokes( y) )
Rules weights
FOL rules
6
Markov Logic Networks
[Richardson & Domingos, 2006]
• MLN: Template for constructing Markov networks
1.5 ∀ x Smokes( x )⇒Cancer( x )
1.1 ∀ x,y Friends ( x,y )⇒(Smokes ( x )⇔Smokes ( y ) )
• Two constants: Anna (A) and Bob (B)
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
7
Markov Logic Networks
[Richardson & Domingos, 2006]
• Probability Mass Function (PMF)
1
P (X=x )= exp
Z
a possible truth
assignment
Normalization
constant
wi ni (x )
∑
(
)
i
Weight of formula i
No. of true
groundings of
formula i in x
• Inference: calculate probability of atoms
– P(Cancer(Anna) | Friends(Anna,Bob), Smokes(Bob))
8
Agenda
•
•
•
•
•
Introduction
Background: MLN
RTE
STS
Future work and Conclusion
9
Recognizing Textual Entailment (RTE)
• Given two sentences, a premise and a
hypothesis, does the first entails the second ?
• e.g
– Premise: “A male gorilla escaped from his cage in
Berlin zoo and sent terrified visitors running for
cover, the zoo said yesterday.”
– Hypothesis: “A gorilla escaped from his cage in a
zoo in Germany. ”
– Entails: true
10
System Architecture
Sent1
Sent2
LF1
BOXER
LF2
Dist. Rule
Constructor
Vector Space
• BOXER [Bos, et al. 2004]: maps sentences to
logical form
• Distributional Rule constructor: generates
relevant soft inference rules based on
distributional similarity
• ALCHEMY: probabilistic MLN inference
• Result: degree of entailment
Rule
Base
ALCHEMY
MLN Inference
result
11
Sample Logical Forms
• Premise: “A man is cutting pickles”
– x,y,z ( man(x) ^ cut(y) ^ agent(y, x) ^ pickles(z) ^ patient(y,
z))
• Hypothesis: “A guy is slicing cucumber”
– x,y,z ( guy(x) ^ slice(y) ^ agent(y, x) ^ cucumber(z) ^
patient(y, z) )
• Hypothesis in the query form
– analogy to negated hypothesis in standard theorem proving
– x,y,z ( guy(x) ^ slice(y) ^ agent(y, x) ^ cucumber(z) ^
patient(y, z) → result())
• Query
– result() [Degree of Entailment]
12
Distributional Lexical Rules
• For every pair of words (a, b) where a is in S1
and b is in S2 add a soft rule relating the two
– x a(x) → b(x) | wt(a, b)
→ →
– wt(a, b) = f( cos(a, b) )
• Premise: “A man is cutting pickles”
• Hypothesis: “A guy is slicing cucumber”
– x man(x) → guy(x) | wt(man, guy)
– x cut(x) → slice(x) | wt(cut, slice)
– x pickle(x) → cucumber(x) | wt(pickle, cucumber)
13
Distributional Phrase Rules
• Premise: “A boy is playing”
• Hypothesis: “A little boy is playing”
• Need rules for phrases
– x boy(x) → little(x) ^ boy(x) | wt(boy, "little boy")
• Compute vectors for phrases using vector
addition [Mitchell & Lapata, 2010]
– "little boy" = little + boy
14
Preliminary Results: RTE-1(2005)
System
Accuracy
Logic only: [Bos & Markert, 2005]
52%
Our System
57%
15
Agenda
•
•
•
•
•
Introduction
Background: MLN
RTE
STS
Future work and Conclusion
16
Semantic Textual Similarity (STS)
• Rate the semantic similarity of two sentences on a 0
to 5 scale
• Gold standards are averaged over multiple human
judgments
• Evaluate by measuring correlation to human rating
S1
S2
score
A man is slicing a cucumber
A guy is cutting a cucumber
5
A man is slicing a cucumber
A guy is cutting a zucchini
4
A man is slicing a cucumber
A woman is cooking a zucchini 3
A man is slicing a cucumber
A monkey is riding a bicycle
1
17
Softening Conjunction for STS
• Logical conjunctions requires satisfying all conjuncts
to satisfy the clause, which is too strict for STS
• Hypothesis:
– x,y,z ( guy(x) ^ cut(y) ^ agent(y, x) ^ cucumber(z) ^
patient(y, z) → result())
• Break the sentence into “micro-clauses” then
combine them using an “averaging combiner”
[Natarajan et al., 2010]
• Becomes:
–
–
–
–
x,y,z guy(x) ^ agent(y, x)→ result()
x,y,z cut(y) ^ agent(y, x)→ result()
x,y,z cut(y) ^ patient(y, z) → result()
x,y,z cucumber(z) ^ patient(y, z) → result()
18
Preliminary Results: STS 2012
• Microsoft video description corpus
– Sentence pairs given human 0-5 rating
– 1,500 pairs equally split into training/test
System
Pearson r
Our System with no distributional rules [Logic only] 0.52
Our System with lexical rules
0.60
Our System with lexical and phrase rules
0.73
Vector Addition [Distributional only]
0.78
Ensemble our best score with vector addition
0.85
Best system in STS 2012 (large ensemble)
0.87
19
Agenda
•
•
•
•
•
Introduction
Background: MLN
RTE
STS
Future work and Conclusion
20
Future Work
• Scale MLN inference to longer and more
complex sentences
• Use multiple parses to reduce impact of
parse errors
• Better Rule base
– Vector space methods for asymmetric weights
• wt(cucumber→vegetable) > wt(vegetable→cucumber)
– Inference rules from existing paraphrase
collections
– More sophisticated phrase vectors
21
Conclusion
• Using MLN to represent semantics
• Combining both logical and
distributional approaches
– Deep semantics: represent sentences
using logic
– Robust system:
• Probabilistic logic and Soft inference rule
• Wide coverage of distributional semantics
22
Thank You