Transcript Document

3. Inductive Learning from Examples:
Version space learning
Prof. Gheorghe Tecuci
Learning Agents Laboratory
Computer Science Department
George Mason University
 2003, G.Tecuci, Learning Agents Laboratory
1
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination algorithm
The LEX system
The learning bias
Discussion
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
2
Basic ontological elements: instances and concepts
An instance is a representation of a particular entity from the application
domain.
A concept is a representation of a set of instances.
state_government
state_government
government_of_US_1943
instance_of
government_of_US_1943
instance_of
government_of_Britain_1943
government_of_Britain_1943
“instance_of” is the relationship between an instance and the concept to
which it belongs.
“state_government” represents the set of all entities that are governments
of states. This set includes “government_of_US_1943” and
“government_of_Britain_1943” which are called positive examples.
An entity which is not an instance of a concept is called a negative
example of that concept.
 2003, G.Tecuci, Learning Agents Laboratory
3
Concept generality
A concept P is more general than another concept Q if and only if the set of
instances represented by P includes the set of instances represented by Q.
Example:
state_government
democratic_government
representative_
democracy
parliamentary_
democracy
totalitarian_
government
state_government
“subconcept_of” is the relationship between
a concept and a more general concept.
subconcept_of
democratic_government
 2003, G.Tecuci, Learning Agents Laboratory
5
A generalization hierarchy
governing_body
ad_hoc_ governing_body
established_ governing_body
other_type_of_
governing_body
state_government
feudal_god_
king_government
other_state_
government
democratic_
government
monarchy
group_governing_body
other_
group_
governing_
body
dictator
deity_figure
representative_
democracy
government_
of_Italy_1943
totalitarian_
government
police_
state
parliamentary_
democracy
government_
of_US_1943
military_
dictatorship
religious_
dictatorship
fascist_
state
communist_
dictatorship
 2003, G.Tecuci, Learning Agents Laboratory
government_
of_Britain_1943
government_
of_USSR_1943
government_
of_Germany_1943
democratic_
council_
or_board
autocratic_
leader
theocratic_
government
religious_
dictatorship
chief_and_
tribal_council
theocratic_
democracy
7
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination algorithm
The LEX system
The learning bias
Discussion
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
9
Empirical inductive concept learning from examples
Illustration
Given
Positive examples of cups:
Negative examples of cups:
Learn
A description of the cup concept:
P1
N1
P2
...
…
has-handle(x), ...
Approach:
Compare the positive and the negative examples of a
concept, in terms of their similarities and differences, and
learn the concept as a generalized description of the
similarities of the positive examples.
Why is Concept Learning important?
Concept Learning allows the agent to recognize other
entities as being instances of the learned concept.
 2003, G.Tecuci, Learning Agents Laboratory
10
The learning problem
Given
• a language of instances;
• a language of generalizations;
• a set of positive examples (E1, ..., En) of a concept
• a set of negative examples (C1, ... , Cm) of the same concept
• a learning bias
• other background knowledge
Determine
• a concept description which is a generalization of the positive
examples that does not cover any of the negative examples
Purpose of concept learning
Predict if an instance is an example of the learned concept.
 2003, G.Tecuci, Learning Agents Laboratory
11
Generalization and specialization rules
Learning a concept from examples is based on
generalization and specialization rules.
A generalization rule is a rule that transforms an
expression into a more general expression.
A specialization rule is a rule that transforms an
expression into a less general expression.
The reverse of any generalization rule is a
specialization rule.
 2003, G.Tecuci, Learning Agents Laboratory
12
Discussion
Indicate several generalizations of the following sentence:
Students who have lived in Fairfax for more then 3 years.
Indicate several specializations of the following sentence:
Students who have lived in Fairfax for more then 3 years.
 2003, G.Tecuci, Learning Agents Laboratory
13
Generalization (and specialization) rules
Turning constants into variables
Climbing the generalization hierarchy
Dropping condition
Generalizing numbers
Adding alternatives
 2003, G.Tecuci, Learning Agents Laboratory
14
Turning constants into variables
Generalizes an expression by replacing a constant
with a variable.
The set of multi_group_forces with 5 subgroups.
?O1
is multi_group_force
number_of_subgroups 5
Japan_1944_Armed_Forces
generalization
5  ?N1
specialization
?N1 5
Axis_forces_Sicily
Allied_forces_operation_Husky
?O1
is multi_group_force
number_of_subgroups ?N1
The set of multi_group_forces with any number of subgroups.
 2003, G.Tecuci, Learning Agents Laboratory
15
Climbing the generalization hierarchies
Generalizes an expression by replacing a concept with
a more general one.
democratic_government
representative_democracy
?O1 is
?O2 is
single_state_force
has_as_governing_body
representative_democracy 
democratic_government
?O2 is
 2003, G.Tecuci, Learning Agents Laboratory
?O2
representative_democracy
generalization
?O1 is
parliamentary_democracy
specialization
The set of single
state forces
governed by
representative
democracies
democratic_government 
representative_democracy
single_state_force
has_as_governing_body
democratic_government
?O2
The set of single
state forces
governed by
democracies
17
Dropping conditions
Generalizes an expression by removing a constraint
from its description.
The set of multi-member forces that have international legitimacy.
?O1
is multi_member_force
has_international_legitimacy
generalization
?O1
“yes”
specialization
is multi_member_force
The set of multi-member forces (that may or may not have
international legitimacy).
 2003, G.Tecuci, Learning Agents Laboratory
19
Extending intervals
Generalizes an expression by replacing a number with an
interval, or by replacing an interval with a larger interval.
?O1
is multi_group_force
number_of_subgroups 5
generalization
5  [3 .. 7]
?O1
?N1
[3 .. 7]  [2 .. 10]
?O1
?N1
specialization
[3 .. 7]  5
is multi_group_force
number_of_subgroups ?N1
is-in [3 .. 7]
generalization
The set of multi_group_forces
with at least 3 subgroups and
at most 7 subgroups.
specialization
[2 .. 10]  [3 .. 7]
is multi_group_force
number_of_subgroups ?N1
is-in [2 .. 10]
 2003, G.Tecuci, Learning Agents Laboratory
The set of multi_group_forces
with exactly 5 subgroups.
The set of multi_group_forces
with at most 10 subgroups.
20
Adding alternatives
Generalizes an expression by replacing a concept C1 with
the union (C1 U C2), which is a more general concept.
The set of alliances.
?O1
is
alliance
has_as_member
?O2
generalization
?O1
is
specialization
alliance OR coalition
has_as_member
?O2
The set including both the alliances and the coalitions.
 2003, G.Tecuci, Learning Agents Laboratory
22
Generalization and specialization rules
Turning constants into variables
Turning variables into constants
Climbing the generalization hierarchies
Descending the generalization hierarchies
Dropping conditions
Adding conditions
Extending intervals
Reducing intervals
Adding alternatives
Dropping alternatives
 2003, G.Tecuci, Learning Agents Laboratory
23
Types of generalizations and specializations
Operational definition of generalization/specialization
Generalization/specialization of two concepts
Minimally general generalization of two concepts
Maximally general specialization of two concepts
Least general generalization of two concepts
 2003, G.Tecuci, Learning Agents Laboratory
24
Operational definition of generalization
Non-operational definition:
A concept P is said to be more general than another concept Q if
and only if the set of instances represented by P includes the set of
instances represented by Q.
Why isn’t this an operational definition?
This definition is not operational because it requires to show that each
instance I from a potential infinite set Q is also in the set P.
Operational definition:
A concept P is said to be more general than another concept Q if
and only if Q can be transformed into P by applying a sequence of
generalization rules.
 2003, G.Tecuci, Learning Agents Laboratory
25
Generalization of two concepts
How would you define this?
Definition:
The concept Cg is a generalization of the concepts C1 and C2 if and
only if Cg is more general than C1 and Cg is more general than C2.
MANEUVER-UNIT
ARMORED-UNIT
INFANTRY-UNIT
MANEUVER-UNIT
is a generalization of
ARMORED-UNIT
and
INFANTRY-UNIT
Is the above definition operational?
Operational definition:
The concept Cg is a generalization of the concepts C1 and C2 if and
only if both C1 and C2 can be transformed into Cg by applying generalization rules (assuming the existence of a complete set of rules).
 2003, G.Tecuci, Learning Agents Laboratory
26
Generalization of two concepts: example
C1:
?O1
C2:
?O1
IS
COURSE-OF-ACTION
TOTAL-NUMBER-OF-OFFENSIVE-ACTIONS 10
TYPE
OFFENSIVE
IS
COURSE-OF-ACTION
TOTAL-NUMBER-OF-OFFENSIVE-ACTIONS 5
Generalize 10 to [5 .. 10]
Drop “?O1 TYPE OFFENSIVE”
Generalize 5 to [5 .. 10]
C:
?O1
IS
COURSE-OF-ACTION
TOTAL-NUMBER-OF-OFFENSIVE-ACTIONS ?N1
?N1 IS-IN
 2003, G.Tecuci, Learning Agents Laboratory
[5 … 10]
Remark: COA=Course of Action
27
Specialization of two concepts
Definition:
The concept Cs is a specialization of the concepts C1 and C2 if and
only if Cs is less general than C1 and Cs is less general than C2.
MILITARYMANEUVER
MILITARYATTACK
PENETRATEMILITARY-TASK
PENETRATE-MILITARY-TASK
is a specialization of
MILITARY-MANEUVER
and
MILITARY-ATTACK
Operational definition:
The concept Cs is a specialization of the concepts C1 and C2 if and
only if both C1 and C2 can be transformed into Cs by applying
specialization rules (or Cs can be transformed into both C1 and into
C2 by applying generalization rules).
This assumes a complete set of rules.
 2003, G.Tecuci, Learning Agents Laboratory
28
Other useful definitions
Minimally general generalization
The concept G is a minimally general generalization of A and
B if and only if G is a generalization of A and B, and G is not
more general than any other generalization of A and B.
Least general generalization
If there is only one minimally general generalization of two
concepts A and B, then this generalization is called the least
general generalization of A and B.
Maximally general specialization
The concept C is a maximally general specialization of two
concepts A and B if and only if C is a specialization of A and B
and no other specialization of A and B is more general than C.
Specialization of a concept with a negative example
 2003, G.Tecuci, Learning Agents Laboratory
29
Concept learning: another illustration
Positive examples:
Allied_Forces_1943
is
equal_partner_multi_state_alliance
has_as_member
US_1943
European_Axis_1943 is
dominant_partner_multi_state_alliance
has_as_member
Germany_1943
Negative examples:
Somali_clans_1992
is
equal_partner_multi_group_coalition
has_as_member
Isasq_somali_clan_1992
Cautious learner
Learned concept:
?O1
?O2
is
multi_state_alliance
has_as_member ?O2
is
single_state_force
A multi-state alliance that has as member a single state force.
 2003, G.Tecuci, Learning Agents Laboratory
30
Discussion
What could be said about the predictions of
a cautious learner?
Concept to be learned
Concept
learned by a
cautions
learner
 2003, G.Tecuci, Learning Agents Laboratory
31
Concept learning: yet another illustration
Positive examples:
Allied_Forces_1943
is
equal_partner_multi_state_alliance
has_as_member
US_1943
European_Axis_1943 is
dominant_partner_multi_state_alliance
has_as_member
Germany_1943
Negative examples:
Somali_clans_1992
is
equal_partner_multi_group_coalition
has_as_member
Isasq_somali_clan_1992
Aggressive learner
Learned concept:
?O1
?O2
is
multi_member_force
has_as_member ?O2
is
single_state_force
A multi-member force that has as member a single state force.
 2003, G.Tecuci, Learning Agents Laboratory
33
Discussion
What could be said about the predictions of
an aggressive learner?
Concept learned by
an aggressive
learner
Concept to be learned
 2003, G.Tecuci, Learning Agents Laboratory
34
Discussion
How could one synergistically integrate a cautious learner
with an aggressive learner to take advantage of their
qualities to compensate for each other’s weaknesses?
Concept to be learned
Concept
learned by a
cautions
learner
Concept learned by
an aggressive
learner
Concept to be learned
Concept learned by
an aggressive
Concept to be learned
learner
Concept
learned by a
cautions
learner
 2003, G.Tecuci, Learning Agents Laboratory
36
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination algorithm
The LEX system
The learning bias
Discussion
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
37
Basic idea of version space concept learning
Consider the examples E1, … , E2 in sequence.
Initialize the lower bound to the first
positive example (LB=E1) and the
upper bound (UB) to the most
general generalization of E1.
If the next example is a positive one,
then generalize LB as little as
possible to cover it.
If the next example is a negative one,
then specialize UB as little as
possible to uncover it and to remain
more general than LB.
Repeat the above two steps with the
rest of examples until UB=LB.
This is the learned concept.
 2003, G.Tecuci, Learning Agents Laboratory
UB
LB+
UB
LB+
+
UB
LB+
+
_
…
UB=LB
_ + + +
+
_
_
38
The candidate elimination algorithm (Mitchell, 1978)
Let us suppose that we have
an example e1 of a concept
to be learned. Then, any
sentence of the
representation language
which is more general than
this example, is a plausible
hypothesis for the concept.
The version space is:
H = { h | h is more general than e1 }
 2003, G.Tecuci, Learning Agents Laboratory
40
The candidate elimination algorithm (cont.)
As new examples and
counterexamples are
presented to the
program, candidate
concepts are eliminated
from H.
This is practically done
by updating the set G
(which is the set of the
most general elements in
H) and the set S (which is
the set of the most
specific elements in H).
 2003, G.Tecuci, Learning Agents Laboratory
more general
UB
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
LB
more specific
41
The candidate elimination algorithm
1. Initialize S to the first positive example and G to its most general
generalization
2. Accept a new training instance I
• If I is a positive example then
- remove from G all the concepts that do not cover I;
- generalize the elements in S as little as possible to cover I
but remain less general than some concept in G;
- keep in S the minimally general concepts.
• If I is a negative example then
- remove from S all the concepts that cover I;
- specialize the elements in G as little as possible to uncover
I and be more general than at least one element from S;
- keep in G the maximally general concepts.
3. Repeat 2 until G=S and they contain a single concept C (this is
the learned concept)
 2003, G.Tecuci, Learning Agents Laboratory
43
Illustration of the candidate elimination algorithm
Language of instances:
(shape, size)
shape: {ball, brick, cube}
size: {large, small}
Language of generalizations:
(shape, size)
shape: {ball, brick, cube, any-shape}
size: {large, small, any-size}
Learning process:
+(ball, large)
1
G = {(any-shape, any-size)}
Input examples:
shape size
ball
large
brick
small
cube large
ball
small
class
+
–
–
+
2
G = {(ball, any-size)
(any-shape, large)}
-(cube, large)
3
G = {(ball, any-size)}
||
S = {(ball, any-size)}
4
S = {(ball, large)}
 2003, G.Tecuci, Learning Agents Laboratory
-(brick, small)
1
+(ball, large)
+(ball, small)
44
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination algorithm
The LEX system
The learning bias
Discussion
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
48
The LEX system
Lex is a system that uses the version space method to learn heuristics
for suggesting when the integration operators should be applied for
solving symbolic integration problems.
The problem of learning control heuristics
Given
Operators for symbolic integration:
OP1:
∫ r f(x) dx --> r ∫ f(x) dx
OP2:
∫ u dv --> uv - ∫ v du,
OP3:
1 f(x) --> f(x)
OP4:
∫ (f1(x) + f2(x))dx --> ∫ f1(x) dx + ∫ f2(x)dx
OP5:
∫ sin(x) dx --> -cos(x) + C
OP6:
∫ cos(x) dx --> sin(x) + C
where u=f1(x) and dv=f2(x)dx
Find
Heuristics for applying the operators as, for instance, the following one:
To solve ∫ rx transc(x) dx apply OP2 with u=rx and dv=transc(x)dx
 2003, G.Tecuci, Learning Agents Laboratory
49
Remarks
The integration operators assure a satisfactory level of competence to the
LEX system. That it, LEX is able in principle to solve a significant class of
symbolic integration problems. However, in practice, it may not be able to
solve many of these problems because this would require too many
resources of time and space.
The description of an operator shows when the operator is applicable, while
a heuristic associated with an operator shows when the operator should be
applied, in order to solve a problem.
LEX tries to discover, for each operator OPi, the definition of the concept:
situations in which OPi should be used.
 2003, G.Tecuci, Learning Agents Laboratory
50
The architecture of LEX
4. How to generate a new problem?
PROBLEM
GENERATOR
∫ 3x cos(x) dx
Version space of a proposed heuristic:
G:
1. What
search
strategy to
use for
problem
solving?
∫ f1(x) f2(x) dx
--> Apply OP2
with u = f1(x) dv = f2(x) dx
S: ∫ 3x cos(x) dx --> Apply OP2
with u = 3x dv = cos(x) dx
PROBLEM
SOLVER
∫3x cos(x) dx
OP2 with
...
u = 3x,
dv = cos(x) dx
LEARNER
One of the suggested
positive training instances:
3x sin(x) - ∫ 3sin(x) dx
...
OP1
∫3x cos(x) dx
--> Apply OP2
with u = 3x dv = cos(x) dx
3x sin(x) - 3∫ sin(x) dx
OP5
3. How to
learn from
these
steps?
How is the
initial VS
defined?
CRITIC
3x sin(x) + 3cos(x) + C
2. How to characterize individual problem solving steps?
 2003, G.Tecuci, Learning Agents Laboratory
51
f
op
prim
poly
transc
trig
sin
cos
m onom
explog
tan
ln
...
rx
r xn
k x ...
kxn
+
-
*
/
^
...
...
exp 3 x ...
Generalization hierarchy for functions
 2003, G.Tecuci, Learning Agents Laboratory
53
Illustration of the learning process
Continue learning of the heuristic for applying OP2:
The problem generator
generates a new problem to solve that is useful for learning.
The problem solver
Solves this problem
The critic
Extract positive and negative examples from the problem solving tree.
The learner
Refine the version space of the heuristic.
 2003, G.Tecuci, Learning Agents Laboratory
54
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination algorithm
The LEX system
The learning bias
Discussion
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
57
The learning bias
A bias is any basis for choosing one generalization
over another, other than strict consistency with the
observed training examples.
Types of bias:
- restricted hypothesis space bias;
- preference bias.
 2003, G.Tecuci, Learning Agents Laboratory
58
Restricted hypothesis space bias
The hypothesis space H (i.e. the space containing all
the possible concept descriptions) is defined by the
generalization language. This language may not be
capable of expressing all possible classes of instances.
Consequently, the hypothesis space in which the
concept description is searched is restricted.
Some of the restricted spaces investigated:
- logical conjunctions (i.e. the learning system will look for a
concept description in the form of a conjunction);
- linear threshold functions (for exemplar-based
representations);
- three-layer neural networks with a fixed number of hidden
units.
 2003, G.Tecuci, Learning Agents Laboratory
59
Restricted hypothesis space bias: example
The language of instances consists of triples of bits as, for
example: (0, 1, 1), (1, 0, 1).
How many concepts are in this space?
The total number of subsets of instances is 28 = 256.
The language of generalizations consists of triples of 0, 1, and
*, where * means any bit, for example: (0, *, 1), (*, 0, 1).
How many concepts could be represented in this language?
This hypothesis space consists of 3x3x3 = 27 elements.
 2003, G.Tecuci, Learning Agents Laboratory
60
Preference bias
A preference bias places a preference ordering over the
hypotheses in the hypothesis space H. The learning
algorithm can then choose the most preferred
hypothesis f in H that is consistent with the training
examples, and produce this hypothesis as its output.
Most preference biases attempt to minimize some measure of syntactic
complexity of the hypothesis representation (e.g. shortest logical
expression, smallest decision tree).
These are variants of Occam's Razor, which is the bias first defined by
William of Occam (1300-1349):
Given two explanations of data, all other things being equal, the
simpler explanation is preferable.
 2003, G.Tecuci, Learning Agents Laboratory
61
Preference bias: representation
How could the preference bias be represented?
In general, the preference bias may be implemented as an order
relationship 'better(f1, f2)' over the hypothesis space H.
Then, the system will choose the "best" hypothesis f, according to the
"better" relationship.
An example of such a relationship:
"less-general-than" which produces the least general expression
consistent with the data.
 2003, G.Tecuci, Learning Agents Laboratory
62
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination algorithm
The LEX system
The learning bias
Discussion
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
63
Problem
Language of instances:
An instance is defined by triplet of the form (specific-color, specific-shape, specific-size)
Language of generalization: (color-concept, shape-concept, size-concept)
Set of examples:
color
orange
blue
red
green
yellow
shape
square
ellipse
triangle
rectangle
circle
size
large
small
small
small
large
class
+ i1
- i2
+ i3
- i4
+ i5
Background knowledge:
any -shape
any -color
warm-color
red
orange y ellow
cold-color
blue green black
poly gone
any -size
round
triangle rectangle circle ellipse
large
small
square
Task:
Apply the candidate elimination algorithm to learn the concept represented by the
above examples.
 2003, G.Tecuci, Learning Agents Laboratory
64
Solution:
+i1: (color = orange) & (shape = square) & (size = large)
S: {[(color = orange) & (shape = square) & (size = large)]}
G: {[(color = any-color) & (shape = any-shape) & (size = any-size)]}
-i2: (color = blue) & (shape = ellipse) & (size = small)
S: {[(color = orange) & (shape = square) & (size = large)]}
G: {[(color = warm-color) & (shape = any-shape) & (size = any-size)],
[(color = any-color) & (shape = polygon) & (size = any-size)],
[(color = any-color) & (shape = any-shape) & (size = large)]}
+i3: (color = red) & (shape = triangle) & (size = small)
S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]}
G: {[(color = warm-color) & (shape = any-shape) & (any-size)],
[(color = any-color) & (shape = polygon) & (size = any-size)]}
-i4: (color = green) & (shape = rectangle) & (size = small)
S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]}
G: {[(color = warm-color) & (shape = any-shape) & (size = any-size)}
+i5: (color = yellow) & (shape = circle) & (size = large)
S: {[(color = warm-color) & (shape = any-shape) & (size = any-size)]}
G: {[(color = warm-color) & (shape = any-shape) & (size = any-size)]}
The concept is:
(color = warm-color) & (shape = any-shape) & (size = any-size)
 2003, G.Tecuci, Learning Agents Laboratory
; a warm color object
65
Does the order of the examples count? Why and how?
Consider the following order:
color
orange
red
yellow
blue
green
 2003, G.Tecuci, Learning Agents Laboratory
shape
square
triangle
circle
ellipse
rectangle
size
large
small
large
small
small
class
+ i1
+ i3
+ i5
- i2
- i4
66
Discussion
What happens if there are not enough examples for S
and G to become identical?
Could we still learn something useful?
How could we classify a new instance?
When could we be sure that the classification is the
same as the one made if the concept were completely
learned?
Could we be sure that the classification is correct?
 2003, G.Tecuci, Learning Agents Laboratory
67
What happens if there are not enough examples for S
and G to become identical?
Let us assume that one learns only from the first 3 examples:
color
orange
blue
red
shape
square
ellipse
triangle
size
large
small
small
class
+ i1
- i2
+ i3
The final version space will be:
S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]}
G: {[(color = warm-color) & (shape = any-shape) & (any-size)],
[(color = any-color) & (shape = polygon) & (size = any-size)]}
 2003, G.Tecuci, Learning Agents Laboratory
68
Assume that the final version space is:
G: {[(color = warm-color) & (shape = any-shape) & (any-size)],
[(color = any-color) & (shape = polygon) & (size = any-size)]}
S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]}
How could we classify the following examples, how certain we are
about the classification, and why?
color
blue
orange
red
blue
 2003, G.Tecuci, Learning Agents Laboratory
shape
circle
square
ellipse
polygon
size
large
small
large
small
class
_
+
don’t know
don’t know
69
Discussion
Could the examples contain errors?
What kind of errors could be found in an example?
What will be the result of the learning algorithm if
there are errors in examples?
What could we do if we know that there are errors?
 2003, G.Tecuci, Learning Agents Laboratory
70
Discussion
Could the examples contain errors?
What kind of errors could be found in an example?
- Classification errors:
- positive examples labeled as negative
- negative examples labeled as positive
- Measurement errors
- errors in the values of the attributes
 2003, G.Tecuci, Learning Agents Laboratory
71
What will be the result of the learning algorithm if
there are errors in examples?
Let us assume that the 4th example is incorrectly classified:
color
orange
blue
red
green
yellow
shape
square
ellipse
triangle
rectangle
circle
size
large
small
small
small
large
class
+ i1
- i2
+ i3
+ i4 (incorrect classification)
+ i5
The version space after the first three examples is:
S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]}
G: {[(color = warm-color) & (shape = any-shape) & (any-size)],
[(color = any-color) & (shape = polygon) & (size = any-size)]}
Continue learning
 2003, G.Tecuci, Learning Agents Laboratory
72
What could we do if we know that there might be
errors in the examples?
If we cannot find a concept consistent with all the training
examples, then we may try to find a concept that is consistent with
all but one of the examples.
If this fails, then we may try to find a concept that is consistent with
all but two of the examples, an so on.
What is a problem with this approach?
Combinatorial explosion.
 2003, G.Tecuci, Learning Agents Laboratory
73
What happens if we extend the generalization language to
include conjunction, disjunction and negation of examples?
Set of examples:
color
orange
blue
red
green
yellow
shape
square
ellipse
triangle
rectangle
circle
size
large
small
small
small
large
class
+ i1
- i2
+ i3
- i4
+ i5
Background knowledge:
any -shape
any -color
warm-color
red
orange y ellow
cold-color
blue green black
poly gone
any -size
round
triangle rectangle circle ellipse
large
small
square
Task:
Learn the concept represented by the above examples by applying the Versions
Space method.
 2003, G.Tecuci, Learning Agents Laboratory
74
Set of examples:
color
orange
blue
red
green
yellow
shape
square
ellipse
triangle
rectangle
circle
size
large
small
small
small
large
class
+ i1
- i2
+ i3
- i4
+ i5
G = {all the examples}
S = { i1 }
G = { ¬i2 } ; all the examples except i2
S = { i1 }
G = { ¬i2 }
S = { i1 or i3 }
These are
the minimal
generalizations
and
specializations
G = { ¬i2 or ¬i4 } ; all examples except i2 and i4
S = { i1 or i3 }
G = { ¬i2 or ¬i4 } ; all examples except i2 and i4
S = { i1 or i3 or i5 }
 2003, G.Tecuci, Learning Agents Laboratory
75
The futility of bias-free learning
A learner that makes no a priori assumptions regarding
the identity of the target concept has no rational basis
for classifying any unseen instance.
 2003, G.Tecuci, Learning Agents Laboratory
76
What happens if we extend the generalization language to include
internal disjunction? Does the algorithm still generalizes over the
observed data?
Generalization(i1, i3): (orange or red, square or triangle, large or small)
Is it different from: i1 or i3?
Set of examples:
color
orange
blue
red
green
yellow
shape
square
ellipse
triangle
rectangle
circle
size
large
small
small
small
large
class
+ i1
- i2
+ i3
- i4
+ i5
Background knowledge:
any -shape
any -color
warm-color
red
orange y ellow
cold-color
blue green black
poly gone
any -size
round
triangle rectangle circle ellipse
large
small
square
Task:
Learn the concept represented by the above examples by applying the Versions
Space method.
 2003, G.Tecuci, Learning Agents Laboratory
77
How is the generalization language extended by the internal
disjunction?
Consider the following generalization hierarchy:
any-shape
polygon
triangle rectangle
 2003, G.Tecuci, Learning Agents Laboratory
circle
78
How is the generalization language extended by the internal disjunction?
any-shape
polygon
triangle
rectangle
circle
The above hierarchy is replaced with the following one:
polygon or circle
polygon
triangle or rectangle
triangle
 2003, G.Tecuci, Learning Agents Laboratory
triangle or rectangle or circle
triangle or circle
rectangle
rectangle or circle
circle
79
Consider now the following generalization hierarchy:
any-color
cold-color
warm-color
red
orange
yellow
blue
green
black
Which is the corresponding hierarchy containing disjunctions?
 2003, G.Tecuci, Learning Agents Laboratory
80
Could you think of another approach to learning a
disjunctive concept with the candidate elimination
algorithm?
Find a concept1 that is consistent with some of the positive
examples and none of the negative examples.
Remove the covered positive examples from the training set and
repeat the procedure for the rest of examples, computing another
concept2 that covers some positive examples, and so on, until
there is no positive example left.
The learned concept is “concept1 or concept2 or …”
Could you specify this algorithm better?
Hint: Initialize S with the first positive example, …
 2003, G.Tecuci, Learning Agents Laboratory
81
Exercise
Consider the following:
Instance language
color {red, orange, yellow, blue, green, black}
Generalization language
color {red, orange, yellow, blue, green, black, warm-color, cold-color, any-color}
sequence of positive and negative examples of a concept, and the background
knowledge represented by the following hierarchy:
any -color
exam ple1(+): orange
exam ple2(-): blue
exam ple3(+): red
warm-color
red
orange y ellow
cold-color
blue green black
Apply the candidate elimination algorithm to learn the concept represented by the
above examples.
 2003, G.Tecuci, Learning Agents Laboratory
82
Features of the version space method
• In its original form learns only conjunctive descriptions.
• However, it could be applied successively to learn disjunctive
descriptions.
• Requires an exhaustive set of examples.
• Conducts an exhaustive bi-directional breadth-first search.
• The sets S and G can be very large for complex problems.
• It is very important from a theoretical point of view, clarifying the
process of inductive concept learning from examples.
• Has very limited practical applicability because of the combinatorial
explosion of the S and G sets.
• It is at the basis of the powerful Disciple multistrategy learning method
which has practical applications.
 2003, G.Tecuci, Learning Agents Laboratory
83
Recommended reading
Mitchell T.M., Machine Learning, Chapter 2: Concept learning and the
general to specific ordering, pp. 20-51, McGraw Hill, 1997.
Mitchell, T.M., Utgoff P.E., Banerji R., Learning by Experimentation:
Acquiring and Refining Problem-Solving Heuristics, in Readings in Machine
Learning.
Tecuci, G., Building Intelligent Agents, Chapter 3: Knowledge
representation and reasoning, pp. 31-75, Academic Press, 1998.
Barr A. and Feigenbaum E. (Eds.), The Handbook of Artificial Intelligence,
vol III, pp.385-400, pp.484-493.
 2003, G.Tecuci, Learning Agents Laboratory
84