Transcript lecture27
Today’s Topics • Read Chapter 21 (skip Section 21.5) of textbook • Exam THURSDAY Dec 17, 5:30-7:30pm (here) • Review of Fall 2014 Final Dec 15 • TA Dmitry at Epic 5:30-7:30pm on Weds Dec 16? • HW5 due Dec 8 (and no later than Dec 11) • Probabilistic Logic • Markov Logic Networks (MLNs) - a popular and successful probabilistic logic • Collective Classification 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 1 Logic & Probability: Two Major Math Underpinnings of AI Logic Statistical Relational Learning Probabilities MLNs a popular approach 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 Slide 2 Statistical Relational Learning (Intro to SRL, Getoor & Tasker (eds), MIT Press, 2007) • Pure Logic Too ‘Fragile’ everything must be either true or false • Pure Statistics Doesn’t Capture/Accept General Knowledge Well (tell it once rather than label N ex’s) x human(x) y motherOf(x, y) • Many Approaches Created Over the Years, Especially Last Few including some at UWisc 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 3 Pedro Domingos Markov Logic Networks (Richards and Domingos, MLj, 2006) • Use FOPC, but add weights to formulae (‘syntax’) wgt=10 x,y,z motherOf(x, z) 𝝠 fatherOf(y, z) married(x, y) - weights represent ‘penalty’ if a candidate world state violates the rule - for ‘pure’ logic, wgt = ∞ • Formulae interpreted (‘semantics’) as compact way to specify a type of graphical model called a Markov Net – like a Bayes net, but undirected arcs – probabilities in Markov nets specified by clique potentials, but we won’t cover them in cs540 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 4 Using an MLN (‘Inference’) • Assume we have a large knowledge base of probabilistic logic rules • Assume we are given the truth values of N predicates (the ‘evidence’) • We may be asked to estimate the most probable joint setting for M ‘query’ predicates • Brute-force solution – Consider 2M possible ‘complete world states’ – Calculate truth value of all grounded formula in each state – Return one with smallest total penalty for violated MLN rules (or, equivalently, the one with largest sum of satisfied rules) 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 Slide 5 Probability of Candidate World States Prob(specific world state) (1/Z) exp( weights of grounded formulae that are true in this world state) Z is a normalizing term; we need to sum over all possible world states (challenging to estimate) - A world state is a conjunction of predicates (eg, married(John, Sue), …, friends(Bill, Ann) ) - if we only want the most probable world state, we don’t need to compute Z If a world state violates a rule with infinite weight, probability of that world state is zero (why?) 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 6 Grounding the MLN Formulae (replacing variables with constants) • Assume we have this domain knowledge wgt = 2 x,y P(x,y) Q(x) wgt = 3 x P(x,1) (R(1) R(x)) • And these constants: 1 and 2 • So we have these ‘grounded’ rules (wgts not shown): P(1,1) Q(1) P(1,2) Q(1) P(2,1) Q(2) P(1,1) R(1) P(2,1) (R(1) R(2)) P(2,2) Q(2) • Aside: Each grounded rule becomes a clique in Markov network (like a CPT in a Bayes Net) 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 7 Simple MLN Example wgt=2 P Q Have: wgt=7 P Q Four possible world states P Q Probability (unnormalized) False False (1/Z) e2+0 1 / e8 False True (1/Z) e2+7 1/e True False (1/Z) e0+7 1 / e3 True True (1/Z) e2+7 1/e The normalizing term: 12/1/15 What is prob(P=true & Q=false) in ‘std’ logic? Z = e2 + e9 + e7 + e9 e10 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 8 Collective Classification • Assume we need to predict the outputs for N examples • Could knowing (probability) example i is true impact (probability) example j is true? Ie, relaxing the iid assumption about examples • For instance “If Alice and Bob are friends, then if Alice likes a movie, Bob (probably) does as well.” 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 9 Collective Classification in MLNs • Imagine we have a bunch of inference rules for predicting likes(Person, Food) • We could add this to our MLN So if we predicted likes of N people, the MLN would be encouraged to give consistent 12/1/15 answers If livesIn(?Person1, ?City) livesIn(?Person2, ?City) isaFood(?Food) Then [ likes(?Person1, ?Food) likes(?Person2, ?Food) ] with wgt = 3 “People in the same city generally like the same sorts of food” CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 10 Don’t buy these → A Famous MLN Example (first rule modified to use ↔) “Smoking frequently causes cancer.” wgt = 3 x smokes(x) ↔ cancer(x) // Assume it’s the ONLY cause “Friends of smokers are likely to smoke.” wgt = 2 x,y friends(x, y) ˄ smokes(x) smokes(y) Assume below are our facts and we want to know the probs of the four world states involving smoking or not of John and Mary A simple collective classification example (try yourself with THREE people!) friends(Mary, Mary), friends(Mary, John), ¬ cancer(Mary) friends(John, Mary), friends(John, John), cancer(John) 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 11 A Famous MLN Example (2) “Smoking frequently causes cancer.” GROUNDED wgt = 3 smokes(J) ↔ cancer(J) wgt = 3 smokes(M) ↔ cancer(M) “Friends of smokers are likely to smoke.” GROUNDED wgt = 2 wgt = 2 wgt = 2 wgt = 2 friends(M, M) ˄ smokes(M) smokes(M) friends(M, J) ˄ smokes(M) smokes(J) friends(J, M) ˄ smokes(J) smokes(M) friends(J, J ) ˄ smokes(J) smokes(J) FACTS friends(M, M), friends(M, J) , ¬ cancer(M) friends(J, M), friends(J, J), cancer(J) 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 12 A Famous MLN Example (3) Possible Complete Word States (1) friends(M,M), friends(M, J) , friends(J,M), friends(J, J) ¬ smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) (2) friends(M,M), friends(M, J), friends(J,M), friends(J, J) ¬ smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) (3) friends(M,M), friends(M, J), friends(J,M), friends(J, J) smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) (4) friends(M,M), friends(M, J), friends(J,M), friends(J, J) smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 13 A Famous MLN Example (4) Possible Complete Word States (1) friends(M,M), friends(M,J), friends(J,M), friends(J,J) ¬ smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J) “Smoking frequently causes cancer.” GROUNDED wgt = 3 smokes(J) ↔ cancer(J) wgt = 3 smokes(M) ↔ cancer(M) F T Sum of Wgts =0+3+2+2 + 2 + 2 = 11 “Friends of smokers are likely to smoke.” GROUNDED wgt = 2 wgt = 2 wgt = 2 wgt = 2 12/1/15 ¬ friends(M,M) ¬ friends(M,J) ¬ friends(J, M) ¬ friends(J,J ) ˅ ˅ ˅ ˅ ¬ smokes(M) ¬ smokes(M) ¬ smokes(J) ¬ smokes(J) ˅ smokes(M) ˅ smokes(J) ˅ smokes(M) ˅ smokes(J) CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 T T T T 14 A Famous MLN Example (5) Possible Complete Word States (2) friends(M,M), friends(M,J) , friends(J,M), friends(J,J) ¬ smokes(M), smokes(J), ¬ cancer(M), cancer(J) “Smoking frequently causes cancer.” GROUNDED wgt = 3 smokes(J) ↔ cancer(J) wgt = 3 smokes(M) ↔ cancer(M) T T Sum of Wgts =3+3+2+2 + 0 + 2 = 12 “Friends of smokers are likely to smoke.” GROUNDED wgt = 2 wgt = 2 wgt = 2 wgt = 2 12/1/15 ¬ friends(M,M) ¬ friends(M,J) ¬ friends(J, M) ¬ friends(J,J ˅ ˅ ˅ ˅ ¬ smokes(M) ¬ smokes(M) ¬ smokes(J) ¬ smokes(J) ˅ smokes(M) ˅ smokes(J) ˅ smokes(M) ˅ smokes(J) CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 T T F T 15 A Famous MLN Example (6) Possible Complete Word States (3) friends(M,M), friends(M,J), friends(J,M), friends(J,J) smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J) “Smoking frequently causes cancer.” GROUNDED wgt = 3 smokes(J) ↔ cancer(J) wgt = 3 smokes(M) ↔ cancer(M) F F Sum of Wgts =0+0+2+0 +2+2=6 “Friends of smokers are likely to smoke.” GROUNDED wgt = 2 wgt = 2 wgt = 2 wgt = 2 12/1/15 ¬ friends(M,M) ¬ friends(M,J) ¬ friends(J, M) ¬ friends(J,J) ˅ ˅ ˅ ˅ ¬ smokes(M) ¬ smokes(M) ¬ smokes(J) ¬ smokes(J) ˅ smokes(M) ˅ smokes(J) ˅ smokes(M) ˅ smokes(J) CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 T F T T 16 A Famous MLN Example (7) Possible Complete Word States (4) friends(M,M), friends(M,J), friends(J,M), friends(J,J) smokes(M), smokes(J), ¬ cancer(M), cancer(J) “Smoking frequently causes cancer.” GROUNDED wgt = 3 smokes(J) ↔ cancer(J) wgt = 3 smokes(M) ↔ cancer(M) T F Sum of Wgts =3+0+2+2 + 2 + 2 = 11 “Friends of smokers are likely to smoke.” GROUNDED wgt = 2 wgt = 2 wgt = 2 wgt = 2 12/1/15 ¬ friends(M,M) ¬ friends(M,J) ¬ friends(J, M) ¬ friends(J,J) ˅ ˅ ˅ ˅ ¬ smokes(M) ¬ smokes(M) ¬ smokes(J) ¬ smokes(J) ˅ smokes(M) ˅ smokes(J) ˅ smokes(M) ˅ smokes(J) CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 T T T T 17 A Famous MLN Example (8) Possible Complete Word States Sum of Wgts of Satisfied Rules (1) friends(M,M), friends(M,J) , friends(J,M), friends(J,J) ¬ smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) (2) (3) (4) 12/1/15 11 friends(M,M), friends(M,J) , friends(J,M), friends(J,J) ¬ smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) 12 friends(M,M), friends(M,J), friends(J,M), friends(J,J) smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) 6 friends(M,M), friends(M,J), friends(J,M), friends(J,J) smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 11 18 A Famous MLN Example (8) Possible Complete Word States (1) friends(M,M), friends(M, J) , friends(J,M), friends(J,J) Sum of Wgts of Satisfied Rules ¬ smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) (2) smokes(J), ¬ cancer(Mary), cancer(John) 12 Prob = e12 / Z e / 4.7 = 0.58 friends(M,M), friends(M, J), friends(J,M), friends(J, J) smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) (4) = 0.21 friends(M,M), friends(M, J) , friends(J,M), friends(J, J) ¬ smokes(M), (3) 11 Prob = e11 / Z 1 / 4.7 6 Prob = e 6 / Z (1 / 5) e-5 0.001 friends(M,M), friends(M, J), friends(J,M), friends(J, J) smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) 11 Prob = e11 / Z 1 / 4.7 = 0.21 Z = e11 + e12 + e6 + e11 = e11 (1 + e + e-5 + 1) 4.7 e11 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 19 A Famous MLN Example (9) (1) ¬ smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J) lost 3 pts because John had cancer yet wasn’t a smoker (2) ¬ smokes(M), smokes(J), ¬ cancer(M), cancer(J) lost 2 pts because Friends J and M had diff smoking habits (3) smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J) lost 8 pts because of three reasons (4) smokes(M), smokes(J), ¬ cancer(M), cancer(J) lost 3 pts because Mary smoked but didn’t have cancer If we had wgt=3 smoking cancer and wgt=2 cancer smoking Then (1) and (4) would have scored differently (but slides already too crowded!) If we had more people, we would have more clearly seen influence of collective classification - try it yourself! 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 20 Handling Probabilistic Evidence - What if the Given’s are Uncertain? • Assume we know Prob( q(1,2) ) = 0.85 • We can represent this as Prob( observedQ(1,2) ) = 1.0 // ie, absolute evidence wgt = 2 observedQ(1,2) q(1,2) 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 21 Grounded Networks can be Very LARGE! Given wgt=2 x,y,z Friends(x, y) Friends(y, z) Friends(x, z) and a world with 109 people How big is the grounded network? 1018 nodes since we need all groundings of Friends(?X, ?Y) (and the number of world states is 21018) So SAMPLING methods needed (and have been published) 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 22 Knowledge-Base Population (http://www.nist.gov/tac/2015/KBP/) Given: Text Corpus (ie, ordinary English) Do: Extract Facts about People Born(Person, Date) AttendedCollege(Person, College, DateRange) EmployedBy(Person, Company, DateRange) SpouseOf(PersonA, PersonB, DateRange) ParentOf(PersonA, PersonB, DateRange) Died(Person, Date) 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 23 Sample Advice for Collective Classification? What might we say to an ML working on KBP? Think about constraints across the relations People are only married to one person at a time. People usually have fewer than five children and rarely more than ten. Typically one graduates from college in their 20’s. Most people only have one job at a time. One cannot go to college before they were born or after they died. Almost always your children are born after you were. People tend to marry people of about the same age. People rarely live to be over 100 years & never over 125. People don’t marry their children. … 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 24 Sample Advice for Collective Classification? What might we say to an ML working on KBP? When converted to MLN notation, Think about across the relations theseconstraints sentences of common-sense knowledge theat results People are only married toimprove one person a time. of information-extraction algorithms People usually have fewer than five children and rarely more than ten. simply extract relation Typically one that graduates from collegeeach in their 20’s. Most people onlyindependently have one job at (and a time.noisily) One cannot go to college before they were born or after they died. Almost always your children are born after you were. People tend to marry people of about the same age. People rarely live to be over 100 years & never over 125. People don’t marry their children. … 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 25 Scaling Up MLN Inference (see ICDM ‘12 paper by Niu et al. titled “Scaling Inference for Markov Logic via Dual Decomposition”) We successfully ran in 1 day on the Knowledge Base Population task with – 240 million facts (from 500 million web articles) – 64 billion logic sentences in the ground MLN – 5 terrabyte database (from GreenPlum, Inc) – 256 GB RAM, 40 cores on 4 machines – See `DeepDive/Wisci’ at www.youtube.com/user/HazyResearch/videos 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 Slide 26 Learning MLNs Like with Bayes Nets, need to learn – Structure (ie, a rule set; could be given by user) – Weights (can use gradient descent) – There is a small literature on these tasks (some by my group) 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 27 MLN Challenges • Estimating probabilities (‘inference’) can be cpu-intensive usually need to use clever sampling methods since # of world states is 0(2N) • Interesting direction: lifted inference (reason at first-order level, rather than on grounded network) • Structure learning and refinement is a major challenge 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 28 MLN Wrapup • Appealing combo of first-order logic and prob/stats (the two primary math underpinnings of AI) • Impressive results on real-world tasks • Appealing approach to ‘knowledge refinement’ 1. Humans write (buggy) common-sense rules 2. MLN algo learns weights (and maybe ‘edits’ rules) • Computationally demanding (both learning MLNs and using them to answer queries) • Other approaches to probabilistic logic exist; vibrant/exciting research area 12/1/15 CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 29