Transcript lecture27

Today’s Topics
• Read Chapter 21 (skip Section 21.5) of textbook
• Exam THURSDAY Dec 17, 5:30-7:30pm (here)
• Review of Fall 2014 Final Dec 15
• TA Dmitry at Epic 5:30-7:30pm on Weds Dec 16?
• HW5 due Dec 8 (and no later than Dec 11)
• Probabilistic Logic
• Markov Logic Networks (MLNs)
- a popular and successful probabilistic logic
• Collective Classification
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
1
Logic & Probability:
Two Major Math Underpinnings of AI
Logic
Statistical
Relational
Learning
Probabilities
MLNs a popular approach
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
Slide 2
Statistical Relational Learning
(Intro to SRL, Getoor & Tasker (eds), MIT Press, 2007)
• Pure Logic Too ‘Fragile’
everything must be either true or false
• Pure Statistics Doesn’t Capture/Accept General
Knowledge Well (tell it once rather than label N ex’s)
x human(x)   y motherOf(x, y)
• Many Approaches Created Over the Years,
Especially Last Few
including some at UWisc
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
3
Pedro Domingos
Markov Logic Networks
(Richards and Domingos, MLj, 2006)
• Use FOPC, but add weights to formulae (‘syntax’)
wgt=10
x,y,z motherOf(x, z) 𝝠 fatherOf(y, z)  married(x, y)
- weights represent ‘penalty’ if a candidate
world state violates the rule
- for ‘pure’ logic, wgt = ∞
• Formulae interpreted (‘semantics’) as compact way to specify
a type of graphical model called a Markov Net
– like a Bayes net, but undirected arcs
– probabilities in Markov nets specified by clique potentials,
but we won’t cover them in cs540
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
4
Using an MLN (‘Inference’)
• Assume we have a large knowledge base
of probabilistic logic rules
• Assume we are given the truth values
of N predicates (the ‘evidence’)
• We may be asked to estimate the most probable
joint setting for M ‘query’ predicates
• Brute-force solution
– Consider 2M possible ‘complete world states’
– Calculate truth value of all grounded formula in each state
– Return one with smallest total penalty for violated MLN rules
(or, equivalently, the one with largest sum of satisfied rules)
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
Slide 5
Probability of Candidate
World States
Prob(specific world state)
 (1/Z) exp( weights of grounded formulae
that are true in this world state)
Z is a normalizing term; we need to sum over all possible
world states (challenging to estimate)
- A world state is a conjunction of predicates
(eg, married(John, Sue), …,  friends(Bill, Ann) )
- if we only want the most probable world state,
we don’t need to compute Z
If a world state violates a rule with infinite weight,
probability of that world state is zero (why?)
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
6
Grounding the MLN Formulae
(replacing variables with constants)
• Assume we have this domain knowledge
wgt = 2 x,y P(x,y)  Q(x)
wgt = 3 x P(x,1)  (R(1)  R(x))
• And these constants: 1 and 2
• So we have these ‘grounded’ rules (wgts not shown):
P(1,1)  Q(1)
P(1,2)  Q(1)
P(2,1)  Q(2)
P(1,1)  R(1)
P(2,1)  (R(1)  R(2))
P(2,2)  Q(2)
• Aside: Each grounded rule becomes a clique in Markov network
(like a CPT in a Bayes Net)
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
7
Simple MLN Example
wgt=2 P  Q
Have:
wgt=7 P  Q
Four possible world states
P
Q
Probability (unnormalized)
False
False
(1/Z) e2+0
 1 / e8
False
True
(1/Z) e2+7
1/e
True
False
(1/Z) e0+7
 1 / e3
True
True
(1/Z) e2+7
1/e
The normalizing term:
12/1/15
What is
prob(P=true & Q=false)
in ‘std’ logic?
Z = e2 + e9 + e7 + e9  e10
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
8
Collective Classification
• Assume we need to predict the outputs
for N examples
• Could knowing (probability) example i is
true impact (probability) example j is true?
Ie, relaxing the iid assumption about examples
• For instance
“If Alice and Bob are friends, then if Alice likes a
movie, Bob (probably) does as well.”
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
9
Collective
Classification in MLNs
• Imagine we have a bunch of inference
rules for predicting likes(Person, Food)
• We could add this to our MLN
So if we
predicted
likes of N
people, the
MLN would
be
encouraged
to give
consistent
12/1/15
answers
If livesIn(?Person1, ?City) 
livesIn(?Person2, ?City)  isaFood(?Food)
Then [ likes(?Person1, ?Food) 
likes(?Person2, ?Food) ] with wgt = 3
“People in the same city generally like the same sorts of food”
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
10
Don’t buy these →
A Famous MLN Example
(first rule modified to use ↔)
“Smoking frequently causes cancer.”
wgt = 3 x
smokes(x) ↔ cancer(x) // Assume it’s the ONLY cause
“Friends of smokers are likely to smoke.”
wgt = 2 x,y friends(x, y) ˄ smokes(x)  smokes(y)
Assume below are our facts and we want to know the probs of the four
world states involving smoking or not of John and Mary
A simple collective classification example (try yourself with THREE people!)
friends(Mary, Mary), friends(Mary, John), ¬ cancer(Mary)
friends(John, Mary), friends(John, John), cancer(John)
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
11
A Famous MLN Example (2)
“Smoking frequently causes cancer.” GROUNDED
wgt = 3 smokes(J) ↔ cancer(J)
wgt = 3 smokes(M) ↔ cancer(M)
“Friends of smokers are likely to smoke.” GROUNDED
wgt = 2
wgt = 2
wgt = 2
wgt = 2
friends(M, M) ˄ smokes(M)  smokes(M)
friends(M, J) ˄ smokes(M)  smokes(J)
friends(J, M) ˄ smokes(J)  smokes(M)
friends(J, J ) ˄ smokes(J)  smokes(J)
FACTS
friends(M, M), friends(M, J) , ¬ cancer(M)
friends(J, M), friends(J, J),
cancer(J)
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
12
A Famous MLN Example (3)
Possible Complete Word States
(1) friends(M,M), friends(M, J) , friends(J,M), friends(J, J)
¬ smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John)
(2)
friends(M,M), friends(M, J), friends(J,M), friends(J, J)
¬ smokes(M), smokes(J), ¬ cancer(Mary), cancer(John)
(3)
friends(M,M), friends(M, J), friends(J,M), friends(J, J)
smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John)
(4)
friends(M,M), friends(M, J), friends(J,M), friends(J, J)
smokes(M), smokes(J), ¬ cancer(Mary), cancer(John)
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
13
A Famous MLN Example (4)
Possible Complete Word States
(1) friends(M,M), friends(M,J), friends(J,M), friends(J,J)
¬ smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J)
“Smoking frequently causes cancer.” GROUNDED
wgt = 3 smokes(J) ↔ cancer(J)
wgt = 3 smokes(M) ↔ cancer(M)
F
T
Sum of Wgts
=0+3+2+2
+ 2 + 2 = 11
“Friends of smokers are likely to smoke.” GROUNDED
wgt = 2
wgt = 2
wgt = 2
wgt = 2
12/1/15
¬ friends(M,M)
¬ friends(M,J)
¬ friends(J, M)
¬ friends(J,J )
˅
˅
˅
˅
¬ smokes(M)
¬ smokes(M)
¬ smokes(J)
¬ smokes(J)
˅ smokes(M)
˅ smokes(J)
˅ smokes(M)
˅ smokes(J)
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
T
T
T
T
14
A Famous MLN Example (5)
Possible Complete Word States
(2) friends(M,M), friends(M,J) , friends(J,M), friends(J,J)
¬ smokes(M), smokes(J), ¬ cancer(M), cancer(J)
“Smoking frequently causes cancer.” GROUNDED
wgt = 3 smokes(J) ↔ cancer(J)
wgt = 3 smokes(M) ↔ cancer(M)
T
T
Sum of Wgts
=3+3+2+2
+ 0 + 2 = 12
“Friends of smokers are likely to smoke.” GROUNDED
wgt = 2
wgt = 2
wgt = 2
wgt = 2
12/1/15
¬ friends(M,M)
¬ friends(M,J)
¬ friends(J, M)
¬ friends(J,J
˅
˅
˅
˅
¬ smokes(M)
¬ smokes(M)
¬ smokes(J)
¬ smokes(J)
˅ smokes(M)
˅ smokes(J)
˅ smokes(M)
˅ smokes(J)
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
T
T
F
T
15
A Famous MLN Example (6)
Possible Complete Word States
(3) friends(M,M), friends(M,J), friends(J,M), friends(J,J)
smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J)
“Smoking frequently causes cancer.” GROUNDED
wgt = 3 smokes(J) ↔ cancer(J)
wgt = 3 smokes(M) ↔ cancer(M)
F
F
Sum of Wgts
=0+0+2+0
+2+2=6
“Friends of smokers are likely to smoke.” GROUNDED
wgt = 2
wgt = 2
wgt = 2
wgt = 2
12/1/15
¬ friends(M,M)
¬ friends(M,J)
¬ friends(J, M)
¬ friends(J,J)
˅
˅
˅
˅
¬ smokes(M)
¬ smokes(M)
¬ smokes(J)
¬ smokes(J)
˅ smokes(M)
˅ smokes(J)
˅ smokes(M)
˅ smokes(J)
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
T
F
T
T
16
A Famous MLN Example (7)
Possible Complete Word States
(4) friends(M,M), friends(M,J), friends(J,M), friends(J,J)
smokes(M), smokes(J), ¬ cancer(M), cancer(J)
“Smoking frequently causes cancer.” GROUNDED
wgt = 3 smokes(J) ↔ cancer(J)
wgt = 3 smokes(M) ↔ cancer(M)
T
F
Sum of Wgts
=3+0+2+2
+ 2 + 2 = 11
“Friends of smokers are likely to smoke.” GROUNDED
wgt = 2
wgt = 2
wgt = 2
wgt = 2
12/1/15
¬ friends(M,M)
¬ friends(M,J)
¬ friends(J, M)
¬ friends(J,J)
˅
˅
˅
˅
¬ smokes(M)
¬ smokes(M)
¬ smokes(J)
¬ smokes(J)
˅ smokes(M)
˅ smokes(J)
˅ smokes(M)
˅ smokes(J)
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
T
T
T
T
17
A Famous MLN Example (8)
Possible Complete Word States
Sum of Wgts
of Satisfied
Rules
(1) friends(M,M), friends(M,J) , friends(J,M), friends(J,J)
¬ smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John)
(2)
(3)
(4)
12/1/15
11
friends(M,M), friends(M,J) , friends(J,M), friends(J,J)
¬ smokes(M), smokes(J), ¬ cancer(Mary), cancer(John)
12
friends(M,M), friends(M,J), friends(J,M), friends(J,J)
smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John)
6
friends(M,M), friends(M,J), friends(J,M), friends(J,J)
smokes(M), smokes(J), ¬ cancer(Mary), cancer(John)
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
11
18
A Famous MLN Example (8)
Possible Complete Word States
(1) friends(M,M), friends(M, J) , friends(J,M), friends(J,J)
Sum of Wgts of
Satisfied Rules
¬ smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John)
(2)
smokes(J), ¬ cancer(Mary), cancer(John)
12 Prob = e12 / Z  e / 4.7
= 0.58
friends(M,M), friends(M, J), friends(J,M), friends(J, J)
smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John)
(4)
= 0.21
friends(M,M), friends(M, J) , friends(J,M), friends(J, J)
¬ smokes(M),
(3)
11 Prob = e11 / Z  1 / 4.7
6 Prob = e 6 / Z  (1 / 5) e-5  0.001
friends(M,M), friends(M, J), friends(J,M), friends(J, J)
smokes(M), smokes(J), ¬ cancer(Mary), cancer(John)
11 Prob = e11 / Z  1 / 4.7
= 0.21
Z = e11 + e12 + e6 + e11 = e11 (1 + e + e-5 + 1)  4.7 e11
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
19
A Famous MLN Example (9)
(1) ¬ smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J)
lost 3 pts because John had cancer yet wasn’t a smoker
(2) ¬ smokes(M), smokes(J), ¬ cancer(M), cancer(J)
lost 2 pts because Friends J and M had diff smoking habits
(3) smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J)
lost 8 pts because of three reasons
(4) smokes(M), smokes(J), ¬ cancer(M), cancer(J)
lost 3 pts because Mary smoked but didn’t have cancer
If we had wgt=3 smoking  cancer and wgt=2 cancer  smoking
Then (1) and (4) would have scored differently (but slides already too crowded!)
If we had more people, we would have more clearly seen influence of
collective classification - try it yourself!
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
20
Handling Probabilistic
Evidence - What if the Given’s are Uncertain?
• Assume we know Prob( q(1,2) ) = 0.85
• We can represent this as
Prob( observedQ(1,2) ) = 1.0
// ie, absolute evidence
wgt = 2 observedQ(1,2)  q(1,2)
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
21
Grounded Networks
can be Very LARGE!
Given
wgt=2 x,y,z Friends(x, y)  Friends(y, z)  Friends(x, z)
and a world with 109 people
How big is the grounded network?
1018 nodes since we need all groundings of Friends(?X, ?Y)
(and the number of world states is 21018)
So SAMPLING methods needed
(and have been published)
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
22
Knowledge-Base Population
(http://www.nist.gov/tac/2015/KBP/)
Given:
Text Corpus (ie, ordinary English)
Do:
Extract Facts about People
Born(Person, Date)
AttendedCollege(Person, College, DateRange)
EmployedBy(Person, Company, DateRange)
SpouseOf(PersonA, PersonB, DateRange)
ParentOf(PersonA, PersonB, DateRange)
Died(Person, Date)
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
23
Sample Advice for
Collective Classification?
What might we say to an ML working on KBP?
Think about constraints across the relations
People are only married to one person at a time.
People usually have fewer than five children and rarely more than ten.
Typically one graduates from college in their 20’s.
Most people only have one job at a time.
One cannot go to college before they were born or after they died.
Almost always your children are born after you were.
People tend to marry people of about the same age.
People rarely live to be over 100 years & never over 125.
People don’t marry their children.
…
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
24
Sample Advice for
Collective Classification?
What might we say to an ML working on KBP?
When converted to MLN notation,
Think about
across the relations
theseconstraints
sentences of common-sense
knowledge
theat results
People are only
married toimprove
one person
a time. of
information-extraction
algorithms
People usually
have fewer than five children
and rarely more than ten.
simply
extract
relation
Typically one that
graduates
from
collegeeach
in their
20’s.
Most people onlyindependently
have one job at (and
a time.noisily)
One cannot go to college before they were born or after they died.
Almost always your children are born after you were.
People tend to marry people of about the same age.
People rarely live to be over 100 years & never over 125.
People don’t marry their children.
…
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
25
Scaling Up MLN Inference
(see ICDM ‘12 paper by Niu et al. titled “Scaling Inference for Markov Logic via Dual Decomposition”)
We successfully ran in 1 day on the
Knowledge Base Population task with
– 240 million facts (from 500 million web articles)
– 64 billion logic sentences in the ground MLN
– 5 terrabyte database (from GreenPlum, Inc)
– 256 GB RAM, 40 cores on 4 machines
– See `DeepDive/Wisci’ at
www.youtube.com/user/HazyResearch/videos
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
Slide 26
Learning MLNs
Like with Bayes Nets, need to learn
– Structure (ie, a rule set; could be given by user)
– Weights (can use gradient descent)
– There is a small literature on these tasks
(some by my group)
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
27
MLN Challenges
• Estimating probabilities (‘inference’)
can be cpu-intensive
usually need to use clever sampling methods
since # of world states is 0(2N)
• Interesting direction: lifted inference
(reason at first-order level, rather than on grounded network)
• Structure learning and refinement
is a major challenge
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
28
MLN Wrapup
• Appealing combo of first-order logic
and prob/stats (the two primary math underpinnings of AI)
• Impressive results on real-world tasks
• Appealing approach to ‘knowledge refinement’
1. Humans write (buggy) common-sense rules
2. MLN algo learns weights (and maybe ‘edits’ rules)
• Computationally demanding
(both learning MLNs and using them to answer queries)
• Other approaches to probabilistic logic exist;
vibrant/exciting research area
12/1/15
CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
29