bonus lecture

Download Report

Transcript bonus lecture

Not on cs540 final
Thirty-Two Years of
Knowledge-Based
Machine Learning
Jude Shavlik
University of Wisconsin
Key Question of AI:
How to Get Knowledge into Computers?
Hand coding
Supervised ML
How can we mix these two extremes?
Small ML subcommunity has looked at ways to do so
Slide 2
Two Underexplored
Questions in ML
• How can we go beyond teaching
machines solely via I/O pairs?
‘advice giving’
• How can we understand what an ML
algorithm has discovered?
‘rule extraction’
Slide 5
Outline
• Explanation-Based Learning (1980’s)
• Knowledge-Based Neural Nets (1990’s)
• Knowledge-Based SVMs (2000’s)
• Markov Logic Networks (2010’s)
Slide 6
Explanation-Based Learning
(EBL) – My PhD Years, 1983-1987
• The EBL Hypothesis
By understanding why an example is a member
of a concept, can learn essential properties of
the concept
• Trade-Off
The need to collect many examples
for
The ability to ‘explain’ single examples
(a ‘domain theory’)
Ie, assume a smarter learner
Slide 7
Knowledge-Based Artificial
Neural Networks, KBANN (1988-2001)
Initial
Symbolic
Domain
Theory
Mooney, Pazzani, Cohen, etc
Final
Symbolic
Domain
Theory
Extract
Examples
Insert
Initial
Neural
Network
Refine
Trained
Neural
network
Slide 8
What Inspired KBANN?
• Geoff Hinton was an invited speaker
at ICML-88
• I recall him saying something like “one
can backprop through any function”
• And I thought “what would it mean to
backprop through a domain theory?”
Slide 9
Inserting Prior Knowledge
into a Neural Network
Domain Theory
Neural Network
Final
Conclusions
Intermediate Conclusions
Supporting Facts
Output
Units
Hidden Units
Input Units
Slide 10
Jumping Ahead a Bit
• Notice that symbolic knowledge
induces a graphical model, which is
then numerically optimized
• Similar perspective later followed in
Markov Logic Networks (MLNs)
• However in MLNs, symbolic knowledge
expressed in first-order logic
Slide 11
Mapping Rules to Nets
If A and
Maps propositional rule
sets to neural networks
B then Z
If B and ¬C then Z
Bias
Z
2
4
4
2
6
4
A
Weight
0
-4
4
4
B
0
0
C
0
D
Slide 12
Case Study:
Learning to Recognize Genes
(Towell, Shavlik & Noordewier, AAAI-90)
promoter :- contact, conformation.
contact :- minus_35, minus_10.
<4 rules for conformation>
<4 rules for minus_35>
<4 rules for minus_10>
contact
minus_35
promoter
(Halved error rate
of standard BP)
conformation
minus_10
We compile rules
to a more basic
language, but
here we compile
for refinability
DNA sequence
Slide 13
Learning Curves
(similar results on many tasks and
with other advice-taking algo’s)
Fixed amount of data
Testset Errors
KBANN
STD. ANN
DOMAIN THEORY
Given error-rate spec
Amount of Training Data
Slide 14
From Prior Knowledge
to Advice (Maclin PhD 1995)
• Originally ‘theory refinement’ community assumed
domain knowledge was available before learning starts
(prior knowledge)
• When applying KBANN to reinforcement learning,
we began to realize
you should be able to provide domain knowledge to a
machine learner whenever you think of something to say
• Changing the metaphor:
commanding vs. advising computers
• Continual (ie, lifelong)
Human Teacher – Machine Learner Cooperation
Slide 15
What Would You
Like to Say to This Penguin?
IF a Bee is (Near and West) &
an Ice is (Near and North)
Then
Begin
Move East
Move North
END
Some Advice for SoccerPlaying Robots
if
distanceToGoal ≤ 10
and
shotAngle  30
then
prefer shoot over all other actions
X
Slide 17
2.5
With advice
2.0
1.5
1.0
Without advice
0.5
-1.0
4000
3000
2000
-0.5
1000
0.0
0
Reinforcement on Testset
Some Sample Results
Number of Training Episodes
Slide 18
Overcoming Bad Advice
2.0
1.5
No Advice
1.0
0.5
Bad Advice
-1.0
-1.5
Number of Training Episodes
4000
3000
2000
-0.5
1000
0.0
0
Reinforcement on Testset
2.5
Rule Extraction
• Initially Geoff Towell (PhD, 1991)
viewed this as simplifying the trained
neural network (M-of-N rules)
• Mark Craven (PhD, 1996) realized
• This is simply another learning task!
• Ie, learn what the neural network computes
• Collect I/O pairs from trained neural network
• Give them to decision-tree learner
• Applies to SVMs, decision forests, etc
Slide 20
KBANN Recap
• Use symbolic knowledge to make an
initial guess at the concept description
Standard neural-net approaches make a random guess
• Use training examples to refine the
initial guess (‘early stopping’ reduces overfitting)
• Nicely maps to incremental (aka online)
machine learning
• Valuable to show user the learned model
expressed in symbols rather than numbers
Knowledge-Based Support
Vector Machines (2001-2011)
• Question arose during 2001
PhD defense of Tina Eliassi-Rad
How would you apply the KBANN
idea using SVMs?
• Led to collaboration with Olvi
Mangasarian (who has worked on
SVMs for over 50 years!)
Slide 22
Generalizing the Idea of a
Training Example for SVM’s
Can extend SVM
linear program to
handle ‘regions as
training examples’
Knowledge-Based
Support Vector Regression
Add soft constraints to linear
program (so need only follow
advice approximately)
Output
4
Advice: In this region,
y should exceed 4
Inputs
minimize
||w||1 + C ||s||1
+ penalty for violating advice
such that
f(x) = y  s
constraints that represent advice
Sample Advice-Taking Results
if
distanceToGoal ≤ 10
and
shotAngle  30
then
prefer shoot over all other actions
1.0
Prob(Score Goal)
Q(shoot) > Q(pass)
Q(shoot) > Q(move)
advice
0.8
2 vs 1 BreakAway,
rewards +1, -1
0.6
std RL
0.4
0.2
0.0
0
Maclin et al:
AAAI ‘05, ‘06, ‘07
5000
10000
15000
Games Played
20000
25000
Automatically
Creating Advice
Interesting approach to transfer learning
(Lisa Torrey, PhD 2009)
So advice giving is
• Learn in task A
done by MACHINE!
• Perform ‘rule extraction’
• Give as advice for related task B
• Since advice not assumed 100% correct,
differences between tasks A and B
handled by training ex’s for task B
Slide 26
Close-Transfer Scenarios
2-on-1 BreakAway
4-on-3 BreakAway
3-on-2 BreakAway
Distant-Transfer Scenarios
3-on-2 KeepAway
3-on-2 BreakAway
3-on-2 MoveDownfield
Some Results: Transfer to
3-on-2 BreakAway
0.6
Probability of Goal
0.5
0.4
0.3
Standard RL
0.2
Skill Transfer from 2-on-1 BreakAway
Skill Transfer from 3-on-2 MoveDownfield
0.1
Skill Transfer from 3-on-2 KeepAway
0
0
500
1000
1500
2000
Training Games
Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006
2500
3000
KBSVM Recap
• Can view symbolic knowledge as a way to
label regions of feature space (rather than
solely labeling points)
• Maximize
Model Simplicity
+ Fit to Advice
+ Fit to Training Examples
• Note: does not fit view of “guess initial model,
then refine using training ex’s”
Slide 30
Markov Logic Networks,
2009+
(and statistical-relational learning in general)
• My current favorite for combining
symbolic knowledge and
numeric learning
• MLN = set of weighted FOPC sentences
wgt=3.2 x,y,z parent(x, y)  parent(z, y)
→ married(x, z)
• Have worked on speeding up MLN
inference (via RDBMS) plus learning
MLN rule sets
Slide 31
Learning a Set of First-Order
Regression Trees (each path to
a leaf is an MLN rule) – ICDM ‘11
Data
Current Rules
vs
=
Gradients
learn
Predicted Probs
+
+
iterate
Final Ruleset =
+
+
+
…
+
Slide 32
Some Results
advisedBy
AUC-PR
CLL
MLN-BT
0.94 ± 0.06
-0.52 ± 0.45
18.4 sec
MLN-BC
0.95 ± 0.05
-0.30 ± 0.06
33.3 sec
Alch-D
0.31 ± 0.10
-3.90 ± 0.41
7.1 hrs
Motif
0.43 ± 0.03
-3.23 ± 0.78
1.8 hrs
LHL
0.42 ± 0.10
-2.94 ± 0.31
37.2 sec
Time
Slide 33
Differences
from KBANN
• Rules involve logical variables
• During learning, we create new rules to
correct errors in initial rules
• Recent followup: also refine initial rules
(note that KBSVMs also do NOT refine rules,
though we had one AAAI paper on that)
Slide 34
Wrapping Up
• Symbolic knowledge refined/extended by
Neural networks
Support-vector machines
MLN rule and weight learning
Applications in genetics,
cancer, machine reading,
robot learning, etc
• Variety of views taken
Make initial guess at concept, then refine weights
Use advice to label a region in feature space
Make initial guess at concept, then add wgt’ed rules
• Seeing what was learned – rule extraction
Slide 35
Some Suggestions
• Allow humans to continually observe
learning and provide symbolic knowledge
at any time
• Never assume symbolic knowledge
is 100% correct
• Allow user to see what was learned in a
symbolic representation to facilitate
additional advice
• Put a graphic on every slide 
Slide 36