Transcript lecture30

Today’s Topics
(only on final at a high level;
Sec 19.5 and Sec 18.5 readings below are ‘skim only’)
• HW5 must be turned in by 11:55pm Fri (soln out early Sat)
• Read Chapters 26 and 27 of textbook for Next Tuesday
• Exam (comprehensive, with focus on material since midterm),
Thurs 5:30-7:30pm, in this room, two pages and notes and
simple calculator (log, e, * / + -) allowed
• Next Tues We’ll Cover My Fall 2014 Final (Spring 2013 Next Weds?)
• A Short Introduction to Inductive Logic Programming (ILP)
– Sec. 19.5 of textbook
- learning FOPC ‘rule sets’
- could, in a follow-up step, learn MLN weights on these rules
(ie, learn ‘structure’ then learn ‘wgts’)
• A Short Introduction to Computational Learning Theory (COLT)
– Sec 18.5 of text
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
1
Inductive Logic Programming
(ILP)
• Use mathematical logic to
– Represent training examples
(goes beyond fixed-length feature vectors)
– Represent learned models (FOPC rule sets)
• ML work in the late ’70s through early ’90s was
logic-based, then statistical ML ‘took over’
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
2
Examples in FOPC
Learned Concept
tower(?E) if
on(?E, ?A, table),
on(?E, ?B, ?A).
(not all have same # of ‘features’)
on(ex1, block1, table) 
on(ex1, block2, block1) 
color(ex1, block1, blue) 
color(ex1, block2, blue) 
size(ex1, block1, large) 
size(ex1, block2, small)
PosEx1
PosEx2
< a much larger number of facts
are needed to describe example2 >
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
3
Searching for a Good Rule
(propositional-logic version)
P is always true
P if A
P if B
P if B and C
12/8/15
P if C
P if B and D
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
4
All Possible Extensions
of a Clause (capital letters are variables)
Assume we are expanding this node
q(X, Z)  p(X, Y)
What are the possible extensions using r/3 ?
r(X,X,X) r(Y,Y,Y)
r(Z,Z,Z)
r(1,1,1)
r(X,Y,Z) r(Z,Y,X)
r(X,X,Y)
r(X,X,1)
r(X,Y,A) r(X,A,B)
r(A,A,A)
r(A,B,1)
and many more …
Choose from: old variables, constants, new vars
Huge branching factor in our search!
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
5
Example: ILP in the Blocks World
Consider this training set
Can you guess an
FOPC rule?
POS
NEG
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
6
Searching for a Good Rule
(FOPC version; cap letters are vars)
+
true  POS
-
…
on(X,Y)  POS
blue(X)  POS
tall(X)  POS
Assume we have: tall(X), wide(Y), square(X), on(X,Y), red(X), green(X), blue(X), block(X)
POSSIBLE RULE LEARNED:
If on(X,Y)  block(Y)  blue(X)  POS
- hard to learn with fixed-length feature vectors!
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
7
Covering Algorithms
(learn a rule, then recur; so disjunctive)
Examples Still
to Cover; use to
learn Rule 2
Examples covered
by Rule 1
+ +
+
+ +
+ +
+ +
+
- -+
-- +
-
+
+
-
12/8/15
-
-
+
+ +
+
- -+
-+
-
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
-
-
+
-
-
8
Using Background
Knowledge (BK) in ILP
• Now consider adding some
domain knowledge about the task being learned
• For example
If Q, R, and W are all true
Then you can infer Z is true
• Can also do arithmetic, etc in BK rule bodies
If SOME_TRIG_CALCS_OUTSIDE_OF_LOGIC
Then openPassingLane(P1, P2, Radius, Angle)
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
9
Searching for a Good Rule
using Deduced Features (eg, Z)
P is always true
P if Z
P if A
P if B
P if C
Note that more
BK can lead to
slower learning!
P if B & Z
P if B and C
P if B and D
But hopefully
less search
depth needed
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
10
Controlling the Search
for a Good Rule
• Choose a ‘seed’ positive example, then only
consider properties that are true about this example
• Specify argument types and whether arguments
are ‘input’ (+) or ‘output’ (-)
– Only consider adding a literal
if all of its input arguments already present in rule
– For example
enemies(+person, -person)
Only if a variable of type PERSON is already in
the rule [eg, murdered(person)], consider
adding that person’s enemies
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
11
Formal Specification of
the ILP Task
Given
a set of pos examples (P)
a set of neg examples (N)
some background knowledge (BK)
Do
Technically, the BK
also contains all the
facts about the pos
and neg examples
plus some rules
12/8/15
induce additional knowledge (AK)
such that
BK  AK allows all/most
in P to be proved
BK  AK allows none/few in N to be proved
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
12
ILP Wrapup
• Use best-first search with a large beam
• Commonly used scoring function
#posExCovered - #negExCoved – ruleLength
• Performs ML without requiring
fixed-length-feature-vectors
• Produces human-readable rules
(straightforward to convert FOPC to English)
• Can be slow due to large search space
• Appealing ‘inner loop’ for prob logic learning
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
13
COLT: Probably Approximately
Correct (PAC) Learning
PAC theory of learning (Valiant ’84)
Given
C
cC
H
, 
N
12/8/15
class of possible concepts
target concept
hypothesis space (usually H = C)
correctness bounds
polynomial number of examples
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
14
Probably Approximately Correct
(PAC) Learning
• Do with probability 1 - , return an h in H
whose accuracy is at least 1 - 
• Do this for any probability distribution
for the examples
• In other words
c
Prob[error(h, c) > ] < 
Shaded regions are where errors occur
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
h
15
How Many Examples
Needed to be PAC?
Consider finite hypothesis spaces
Let Hbad  { h1, …, hz }
• The set of hypotheses whose
(‘testset’) error is > 
• Goal With high prob, eliminate
all items in Hbad via (noise-free)
training examples
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
16
How Many Examples
Needed to be PAC?
How can an h look bad, even though it is
correct on all the training examples?
•
•
If we never see any examples in the
shaded regions
We’ll compute an N s.t. the odds of this
are sufficiently low (recall, N = number of
examples)
c
h
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
17
Hbad
The set of N
examples
• Consider H1  Hbad and ex  { N }
• What is the probability that H1 is
consistent with ex ?
Prob[consistentA(ex, H1)] ≤ 1 - 
(since H1 is bad its error rate is at least )
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
18
Hbad (cont.)
What is the probability that H1 is
consistent with all N examples?
Prob[consistentB({ N }, H1)] ≤ (1 - )|N|
(by iid assumption)
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
19
Hbad (cont.)
What is the probability that some member of Hbad is
consistent with the examples in { N } ?
Prob[consistentC({N}, Hbad)] 
Prob[consistentB({N}, H1)  … 
consistentB({N}, Hz)]
≤ |Hbad| x (1-)|N| // P(A  B) = P(A) + P(B) - P(A  B)
≤ |H| x (1- )|N|
12/8/15
// Hbad  H
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Ignore this in upper
bound calc
20
Solving for #Examples, |N|
We want
Prob[consistentC({N}, Hbad)]
≤ |H| x (1-)|N| < 
Recall that we want the prob of a bad concept
surviving to be less than , our bound on
learning a poor concept
Assume that if many consistent hypotheses
survive, we get unlucky and choose a bad one
(we’re doing a worst-case analysis)
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
21
Solving for |N|
(number of examples needed to be
confident of getting a good model)
Solving
|N| > [ log(1/) + log(|H|) ] / -ln(1-)
Since  ≤ -log(1-) over [0,1) we get
Notice we made
NO assumptions
about the prob
dist of the data
(other than it
does not change)
|N| > [ log(1/) + log(|H|) ] / 
(Aside: notice that this calculation assumed we could
always find a hypothesis that fits the training data)
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
22
Example:
Number of Instances Needed
Assume
F = 100 binary features
H = all (pure) conjuncts
[3|F| possibilities (i, use fi, use ¬ fi, or ignore fi)
so log |H| = |F|  log 3 ≈ |F| ]
 = 0.01
 = 0.01
N = [log(1/)+log(|H|)] /  = 100  [log(100) + 100] ≈ 104
But how many real-world concepts are pure conjuncts
with noise-free training data?
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
23
Agnostic Learning
•
So far we’ve assumed we knew the concept class
- but that is unrealistic on real-world data
•
In agnostic learning we relax this assumption
•
We instead aim to find a hypothesis arbitrarily close
(ie <  error) to the best* hypothesis in
our hypothesis space
•
We now need |N| ≥ [ log(1/) + log(|H|) ] / 22
(denominator had been just  before)
* ie, closest to the true concept
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
24
Two Senses of
Complexity
Sample complexity
(number of examples needed)
vs.
Time complexity
(time needed to find h  H that is
consistent with the training examples)
- in CS, we usually only address time complexity
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
25
Complexity (cont.)
– Some concepts require a polynomial
number of examples, but an exponential
amount of time (in the worst case)
– Eg, optimally training neural networks is
NP-hard (recall BP is a ‘greedy’
algorithm that finds a local min)
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
26
Some Other COLT Topics
COLT
+ clustering
+ k-NN
+ RL
+ SVMs
+ ANNs
+ ILP, etc.
12/8/15
• Average case analysis
(vs. worst case)
• Learnability of natural
languages
(language innate?)
• Learnability in parallel
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
27
Summary of COLT
Strengths
•
Formalizes learning task
•
Allows for imperfections (eg,  and  in PAC)
•
Work on boosting excellent case of ML theory
influencing ML practice
•
Shows what concepts are intrinsically hard to learn
(eg, k-term DNF*)
* though a superset of this class is PAC learnable!
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
28
Summary of COLT
Weaknesses
• Most analyses are worst case
• Hence, bounds often much higher
than what works in practice (see
Domingos article assigned early this semester)
• Use of ‘prior knowledge’ not
captured very well yet
12/8/15
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
29