Transcript Lecture 16

Machine learning: building agents that are
capable to learn from their own experience
An autonomous agent is expected to learn from its own experience, not just
to utilize knowledge built-in by the agent’s designer. There are at least two
reasons why this is important:
1 In complex environments, the agent may encounter situations which
are not reflected in its knowledge base.
2 In dynamic environments, the world evolves over time. The agent must
be able to revise its internal model of the world to reflect the changing
world.
Note, however, that “... you cannot learn anything unless you almost know
it already” (Martin’s law, formulated by William Martin in 1979).
We distinguish two kinds of machine learning depending on whether the goal
is to “learn” new knowledge or to “update” the existing knowledge. Let us refer
to the first kind of ML as data mining; it is based on digging useful
descriptions out of data and formalizing it into an appropriate representation.
The second kind of ML is based on acquiring new information and fitting it into
the current knowledge base; this is referred to as knowledge refinement.
Data mining methods





Learning by recording cases (learning by analogy). Situations are recorded as-is,
doing nothing to the information in those situations until they are used. When a new
situation is encountered and you have to guess a property of an object, given nothing
else but a set of initial situations find the most similar one and assume that the
unknown property is the same as in the reference situation.
Learning by building identification trees (learning via rule induction). By looking for
regularities in the data, we can build identification trees which are then used to
classify unknown information.
Learning by training neural networks. In neural nets, neuronlike elements are
arranged in nets, which are then used to recognize instances of patterns. The
procedure used to train the net is called back-propagation. This procedure alters the
effect of one simulated neuron on another in order to improve overall performance.
Learning by training perceptrons. Perceptrons are special kinds of neural nets,
which so simple that can be viewed of just composed by one neuronlike element. By
means of the so-called convergence procedure the performance of the perceptron
can be improved in order to correctly classify objects.
Learning by simulating evolution. The so-called generic algorithms are based on
ideas analogous to individuals, chromosome crossover, gene mutation, natural
selection, etc. and are intended to simulate certain characteristics of heredity and
evolution.
Knowledge refinement methods




Learning by analyzing differences (learning from observations). This is based on
analyzing the differences that appear in a sequence of observations (positive and
negative examples). The goal is to learn to correctly recognize members of a given
class. The learning process starts by declaring the initial example to be the model, and
then this initial model is incrementally improved using a series of examples. Negative
examples are important in order to specialize the model, while positive examples allow
to generalize the model to recognize all members of the class.
Learning by explaining experience (explanation-based learning). Explanations from
causal chains are put together in a new “simpler” dependency, which next time can be
directly applied to the similar initial situation.
Learning by managing multiple models. This method utilizes positive and negative
examples to create a version space where it is possible to determine what it takes to be
a member of a class.
Learning by correcting mistakes (knowledge revision). When an error is identified,
the system tries to reveal the culprit for that error by analyzing the problem solving
process and building an explaining why an error has occurred. Next, the system uses
the explanation to revise the model in order to get rid of the error.
Learning via rule induction: an example
Consider the classification task of recognizing different types of aircrafts based
on their characteristics, and assume that an appropriate and sufficient set of test
cases (i.e. examples of correct classifications) is available. Assume also that
there are four classes of aircrafts: C130, C141, C5A and B747. Our task is to
correctly classify an unknown object as a member of one of these four classes.
Step 1 Identify the classes. Here we have four classes of aircrafts: C130,
C141, C5A and B747.
Step 2 Identify the attributes of class members.









Number of engines (2, 3, 4)
Type of engines (jet, propeller)
Wing position (high, low)
Wing shape (swept back, conventional)
Tail shape (T-shaped, conventional)
Bulges on the fuselage (aft of the cockpit, aft of the wing, under the wing,
none)
size and dimensions
color and markings
these attributes can be ignored
speed and altitude
Example (cont.)
The rule induction process consists of the following steps:
1 Building a table describing objects, selected attributes and their values.
object
attribute
C130
C141
C5A
B747
engine type
Prop
Jet
Jet
Jet
wing position
high
high
high
low
wing shape
conventional
swept-back
swept-back swept-back
tail
conventional
T-tail
T-tail
conventional
bulges
under wings
aft wings
none
aft cockpit
2 Building the decision tree, where each node is either a question about the
value of a given attribute, or a conclusion. Edges coming out of the
question nodes represent one of the possible values of the attribute.
Let us choose (arbitrary) the root node of the tree to be the engine type:
engine type
Jet
Prop
wing shape
C130
swept-back
conventional
wing position
?
low
high
B747
tail shape
conventional
T-tail
?
bulges
none aft wing aft cockpit under wing
C5A
C141
?
?
Decision trees are not unique; we may have alternative trees by reordering
the nodes. This way we can eliminate nodes that lead to impossible
conclusions. The following is an alternative tree:
engine type
Jet
Prop
wing position
C130
low
high
B747
bulges
none
aft wing
C5A
C141
3 Generating rules from trees by means of the following algorithm:
A. Identify a conclusion node that has not yet been dealt with.
B. Trace the path from the conclusion node backward to the root node.
C. The conclusion forms the “then” part of the rule, and the rest of the
nodes along a given path form the “if” part of the rule.
D. Repeat this process for each conclusion node.
The following rules will be acquired from the later decision tree:
Rule1: If (engine-type = prop)
Then (plane = C130)
Rule 2: If (engine-type = jet)
(wing-position = low)
Then (plane = B747)
Rule3: If (engine-type = jet)
(wing-position = high)
(bulges = none)
Then (plane = C5A)
Rule 4: If (engine-type = jet)
(wing-position = high)
(bulges = aft of wing)
Then (plane = C130)
Note that this rule base is not the most efficient one. We can have a more
efficient set of rules (efficiency is measured here by the volume of data
the system needs in order to correctly classify an object) provided the
following tree:
bulges
none
C5A
aft wings aft cockpit under wings
C141
B747
C130
The corresponding rule base is the following:
Rule 1: If (bulges = none)
Then (plane = C5A)
Rule 3: If (bulges = aft-of-cockpit)
Then (plane = B747)
Rule 2: If (bulges = aft-of-wings)
Then (plane = C141)
Rule 4: If (bulges = under-wings)
Then (plane = C130)
The ID3 algorithm for rule generation.
Note that learning based on decision trees is very limited; it can only be applied
in very simple, completely specified world. The ID3 algorithm is an extension
of decision tree method which provides a more systematic way to acquire rules
from test cases.
Example: Assume you want to build a KBS advising about market investments
based on a set of historic cases. Assume also that investment opportunities are
limited to:
– Investment in blue chip stocks.
– Investment in North American gold mining stocks.
– Investment in mortgage-related securities.
The system must determine the most successful investment for a given set of
conditions.
Step 1: Identification of a set of attributes



interest rates
amount of cash available in Japan, Europe and the U.S.
the degree of international tension.
Step 2: Given historical data, build a table representing these cases
Fund type
Interest rates Cash available
Tension
Fund value
medium
high
low
case 1 Blue chip stocks
case 2 Blue chip stocks
case 3 Blue chip stocks
high
low
medium
high
high
low
medium
medium
high
case 4 Gold stocks
case 5 Gold stocks
case 6 Gold stocks
high
low
medium
high
high
low
medium
medium
high
high
medium
medium
case 7 Mortgage-related
case 8 Mortgage-related
case 9 Mortgage-related
high
low
medium
high
high
low
medium
medium
high
low
high
low
attributes
classes
Example (cont.)
3. Build a decision tree based on the measure of the entropy of each attribute.
The entropy is a measure of uncertainty of a given attribute: the higher the
entropy, the higher the uncertainty of its values.
(Please refer to the handouts distributed in class)
4. Acquiring the rules from the resulting tree.
Rule 1: If (interest-rates = high)
(fund-type = blue-chip)
Then: (fund-value = medium)
Rule 2: If (interest-rates = high)
(fund-type = gold-stocks)
Then: (fund-value = high)
Rule 3: If (interest-rates = high)
(fund-type = mortgage-related)
Then: (fund-value = low)
Example (cont.)
Rule 4: If (interest-rates = medium)
(fund-type = blue-chip)
Then: (fund-value = low)
Rule 5: If (interest-rates = medium)
(fund-type = gold-stocks)
Then: (fund-value = medium)
Rule 6: If (interest-rates = medium)
(fund-type = mortgage-related)
Then: (fund-value = low)
Rule 7: If (interest-rates = low)
(fund-type = blue-chip)
Then: (fund-value = high)
Rule 8: If (interest-rates = low)
(fund-type = gold-stocks)
Then: (fund-value = medium)
Rule 9: If (interest-rates = low)
(fund-type = mortgage-related)
Then: (fund-value = high)
Problems with rule induction methods based on
test cases
1 The quality of the rule base depends on the quality of test cases. Note that
the number of test cases is not a criterion, because some of them may
describe similar situations.
2 There may be conflicts between test cases, which means that additional
attributes may need to be considered.
3 For large domains, this approach will result in huge trees, and thus in
unefficient rule bases.
4 This approach develops “flat rules”, i.e. each rule results in a final
conclusion.