Transcript Slides
Richard Montague Andrey Markov Montague Meets Markov: Deep Semantics with Probabilistic Logical Form Islam Beltagy, Cuong Chau, Gemma Boleda, Dan Garrette, Katrin Erk, Raymond Mooney The University of Texas at Austin Semantic Representations • Formal Semantics – Uses first-order logic – Deep – Brittle • Distributional Semantics – Statistical method – Robust – Shallow • Goal: combine advantages of both logical and distributional semantics in one framework 2 Semantic Representations • Combining both logical and distributional semantics – Represent meaning using a probabilistic logic (in contrast with standard first-order logic) • Markov Logic Network (MLN) – Generate soft inference rules • From distributional semantics x hamster(x) → gerbil(x) | f(w) 3 Agenda • • • • • Introduction Background: MLN RTE STS Future work and Conclusion 4 Agenda • • • • • Introduction Background: MLN RTE STS Future work and Conclusion 5 Markov Logic Networks [Richardson & Domingos, 2006] • MLN: Soft FOL – Weighted rules 1.5 1.1 ∀ x Smokes( x)⇒Cancer( x) ∀ x,y Friends ( x,y)⇒(Smokes( x )⇔Smokes( y) ) Rules weights FOL rules 6 Markov Logic Networks [Richardson & Domingos, 2006] • MLN: Template for constructing Markov networks 1.5 ∀ x Smokes( x )⇒Cancer( x ) 1.1 ∀ x,y Friends ( x,y )⇒(Smokes ( x )⇔Smokes ( y ) ) • Two constants: Anna (A) and Bob (B) Friends(A,B) Friends(A,A) Smokes(A) Smokes(B) Cancer(A) Friends(B,B) Cancer(B) Friends(B,A) 7 Markov Logic Networks [Richardson & Domingos, 2006] • Probability Mass Function (PMF) 1 P (X=x )= exp Z a possible truth assignment Normalization constant wi ni (x ) ∑ ( ) i Weight of formula i No. of true groundings of formula i in x • Inference: calculate probability of atoms – P(Cancer(Anna) | Friends(Anna,Bob), Smokes(Bob)) 8 Agenda • • • • • Introduction Background: MLN RTE STS Future work and Conclusion 9 Recognizing Textual Entailment (RTE) • Given two sentences, a premise and a hypothesis, does the first entails the second ? • e.g – Premise: “A male gorilla escaped from his cage in Berlin zoo and sent terrified visitors running for cover, the zoo said yesterday.” – Hypothesis: “A gorilla escaped from his cage in a zoo in Germany. ” – Entails: true 10 System Architecture Sent1 Sent2 LF1 BOXER LF2 Dist. Rule Constructor Vector Space • BOXER [Bos, et al. 2004]: maps sentences to logical form • Distributional Rule constructor: generates relevant soft inference rules based on distributional similarity • ALCHEMY: probabilistic MLN inference • Result: degree of entailment Rule Base ALCHEMY MLN Inference result 11 Sample Logical Forms • Premise: “A man is cutting pickles” – x,y,z ( man(x) ^ cut(y) ^ agent(y, x) ^ pickles(z) ^ patient(y, z)) • Hypothesis: “A guy is slicing cucumber” – x,y,z ( guy(x) ^ slice(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) ) • Hypothesis in the query form – analogy to negated hypothesis in standard theorem proving – x,y,z ( guy(x) ^ slice(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) → result()) • Query – result() [Degree of Entailment] 12 Distributional Lexical Rules • For every pair of words (a, b) where a is in S1 and b is in S2 add a soft rule relating the two – x a(x) → b(x) | wt(a, b) → → – wt(a, b) = f( cos(a, b) ) • Premise: “A man is cutting pickles” • Hypothesis: “A guy is slicing cucumber” – x man(x) → guy(x) | wt(man, guy) – x cut(x) → slice(x) | wt(cut, slice) – x pickle(x) → cucumber(x) | wt(pickle, cucumber) 13 Distributional Phrase Rules • Premise: “A boy is playing” • Hypothesis: “A little boy is playing” • Need rules for phrases – x boy(x) → little(x) ^ boy(x) | wt(boy, "little boy") • Compute vectors for phrases using vector addition [Mitchell & Lapata, 2010] – "little boy" = little + boy 14 Preliminary Results: RTE-1(2005) System Accuracy Logic only: [Bos & Markert, 2005] 52% Our System 57% 15 Agenda • • • • • Introduction Background: MLN RTE STS Future work and Conclusion 16 Semantic Textual Similarity (STS) • Rate the semantic similarity of two sentences on a 0 to 5 scale • Gold standards are averaged over multiple human judgments • Evaluate by measuring correlation to human rating S1 S2 score A man is slicing a cucumber A guy is cutting a cucumber 5 A man is slicing a cucumber A guy is cutting a zucchini 4 A man is slicing a cucumber A woman is cooking a zucchini 3 A man is slicing a cucumber A monkey is riding a bicycle 1 17 Softening Conjunction for STS • Logical conjunctions requires satisfying all conjuncts to satisfy the clause, which is too strict for STS • Hypothesis: – x,y,z ( guy(x) ^ cut(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) → result()) • Break the sentence into “micro-clauses” then combine them using an “averaging combiner” [Natarajan et al., 2010] • Becomes: – – – – x,y,z guy(x) ^ agent(y, x)→ result() x,y,z cut(y) ^ agent(y, x)→ result() x,y,z cut(y) ^ patient(y, z) → result() x,y,z cucumber(z) ^ patient(y, z) → result() 18 Preliminary Results: STS 2012 • Microsoft video description corpus – Sentence pairs given human 0-5 rating – 1,500 pairs equally split into training/test System Pearson r Our System with no distributional rules [Logic only] 0.52 Our System with lexical rules 0.60 Our System with lexical and phrase rules 0.73 Vector Addition [Distributional only] 0.78 Ensemble our best score with vector addition 0.85 Best system in STS 2012 (large ensemble) 0.87 19 Agenda • • • • • Introduction Background: MLN RTE STS Future work and Conclusion 20 Future Work • Scale MLN inference to longer and more complex sentences • Use multiple parses to reduce impact of parse errors • Better Rule base – Vector space methods for asymmetric weights • wt(cucumber→vegetable) > wt(vegetable→cucumber) – Inference rules from existing paraphrase collections – More sophisticated phrase vectors 21 Conclusion • Using MLN to represent semantics • Combining both logical and distributional approaches – Deep semantics: represent sentences using logic – Robust system: • Probabilistic logic and Soft inference rule • Wide coverage of distributional semantics 22 Thank You