chpater 9 Machine Leraning
Download
Report
Transcript chpater 9 Machine Leraning
KU NLP
Ch 9. Machine Learning: Symbolbased
9.0 Introduction
9.1 A Framework for Symbol-Based Learning
9.2 Version Space Search
The Candidate Elimination Algorithm
9.3 ID3 Decision Tree Induction Algorithm
9.5 Knowledge and Learning
Explanation-Based Learning
9.6 Unsupervised Learning
Conceptual clustering
Machine Learning
1
9.0 Introduction
KU NLP
Learning
through the course of their interactions with the world
through the experience of their own internal states and
processes
Is important for practical applications of AI
Knowledge engineering bottleneck
major obstacle to the widespread use of intelligent systems
the cost and difficulty of building expert systems using
traditional knowledge acquisition techniques
one solution
For program to begin with a minimal amount of knowledge
And learn from examples, high-level advice, own
explorations of the domain
Machine Learning
2
9.0 Introduction
KU NLP
Definition of learning
Any change in a system that allow it to perform better the second
time on repetition of the same task or on another task drawn form
the same population (Simon, 1983)
Views of Learning
Generalization from experience
Induction: must generalize correctly to unseen instances
of domain
Inductive biases: selection criteria (must select the most
effective aspects of their experience)
Changes in the learner
acquisition of explicitly represented domain knowledge,
based on its experience, the learner constructs or modifies
expressions in a formal language (e.g. logic).
Machine Learning
3
9.0 Introduction
KU NLP
Learning Algorithms vary in
goals, available training data, learning strategies and
knowledge representation languages
All algorithms learn by searching through a space
of possible concepts to find an acceptable
generalization (concept space Fig. 9.5)
Inductive learning
learning a generalization from a set of examples
concept learning is a typical inductive learning
infer a definition from given examples of some concept (e.g.
cat, soybean disease)
allow to correctly recognize future instances of that concept
Two algorithms: version space search and ID3
Machine Learning
4
9.0 Introduction
KU NLP
Similarity-based vs. Explanation-based
Similarity-based (data-driven)
using no prior knowledge of the domain
rely on large numbers of examples
generalization on the basis of patterns in training data
Explanation-based Learning(prior knowledge-driven)
using prior knowledge of the domain to guide generalization
learning by analogy and other technology that utilize prior knowledge
to learn from a limited amount of training data
Machine Learning
5
9.0 Introduction
KU NLP
Supervised vs. Unsupervised
supervised learning
learning from training instances of known classification
unsupervised learning
learning from unclassified training data
conceptual clustering or category formation
Machine Learning
6
KU NLP
9.1 Framework for Symbol-based
Learning
Learning Algorithms are characterized by a general
model (Fig. 9.1, p 354, sp 8)
Data and goals of the learning task
Representation Language
A set of operations
Concept space
Heuristic Search
Acquired knowledge
Machine Learning
7
KU NLP
Machine Learning
A general model of the learning
process (Fig. 9.1)
8
KU NLP
9.1 Framework for Symbol-based
Learning
Data and Goals
Type of data
positive or negative examples
Single positive example and domain specific knowledge
high-level advice (e.g. condition of loop termination)
analogies(e.g. electricity vs. water)
Goal of Learning algorithms: acquisition of
concept, general description of a class of objects
plans
problem-solving heuristics
other forms of procedural knowledge
Properties and quality of data
come from the outside environment (e.g. teacher)
or generated by the program itself
reliable or contain noise
well-structured or unorganized
positive and negative or only positive
Machine Learning
9
KU NLP
9.1 Framework for Symbol-based
Learning
Concept
learning
Data
Explanationbased
Clustering
Positive/negative A training example A set of
examples of a
+
unclassified
target class
prior knowledge
instances
Goal To infer a general To infer a general To discover
definition
Machine Learning
concept
categorizations
10
KU NLP
9.1 Framework for Symbol-based
Learning
Representation of learned knowledge
concept expressions in predicate calculus
A simple formulation of the concept learning problem as
conjunctive sentences containing variables
size(obj1, small) ^ color(obj1, red) ^ shape(obj1, round)
size(obj2, large) ^ color(obj2, red) ^ shape(obj2, round)
=> size(X, Y) ^ color(X, red) ^ shape(X, round)
structured representation such as frames
description of plans as a sequence of operations or triangle table
representation of heuristics as problem-solving rules
Machine Learning
11
9.1 Framework for Symbol-based
Learning
KU NLP
A Set of operations
Given a set of training instances, the leaner must construct a
generalization, heuristic rule, or plan that satisfies its goal
Requires ability to manipulate representations
Typical operations include
generalizing or specializing symbolic expressions
adjusting the weights in a neural network
modifying the program’s representations
Concept space
defines a space of potential concept definitions
complexity of potential concept space is a measure of difficulty of
learning algorithms
Machine Learning
12
9.1 Framework for Symbol-based
Learning
KU NLP
Heuristic Search
Use available training data and heuristics to search efficiently
Patrick Winston’s work on learning concepts from positive and
negative examples along with near misses (Fig. 9.2).
The program learns by refining candidate description of the target
concept through generalization and specialization.
Generalization changes the candidate description to let it
accommodate new positive examples (Fig. 9.3)
Specialization changes the candidate description to exclude near
misses (Fig. 9.4)
Performance of learning algorithm is highly sensitive to the quality
and order of the training examples
Machine Learning
13
KU NLP
Examples and Near Misses for the
concept “Arch” (Fig. 9.2)
Machine Learning
14
KU NLP
Machine Learning
Generalization of descriptions
(Figure 9.3)
15
KU NLP
Generalizations of descriptions (Fig
9.3 continued)
Machine Learning
16
Specialization of description (Figure 9.4)
KU NLP
Machine Learning
17
9.2 Version Space Search
KU NLP
Implementation of inductive learning as search
through a concept space
Generalization operations impose an ordering on the
concepts in a space, and uses this ordering to guide
the search
9.2.1 Generalization Operators and Concept Space
9.2.2 Candidate Elimination Algorithm
Machine Learning
18
KU NLP
9.2.1 Generalization Operators and
the Concept Spaces
Primary generalization operations used in ML
Replacing constants with variables
color(ball, red) -> color(X, red)
Dropping conditions from a conjunctive expression
shape(X, round) ^ size(X, small) ^ color(X, red)
-> shape(X, round) ^ color(X, red)
Adding a disjunct to an expression
shape(X, round) ^ size(X, small) ^ color(X, red)
-> shape(X, round) ^ size(X, small) ^ (color(X, red) color(X,
blue))
Replacing a property with its parent in a class hierarchy
color(X, red)
-> color(X, primary_color) if primary_color is superclass of red
Machine Learning
19
9.2.1 Generalization Operators and
the Concept Spaces
KU NLP
Notion of covering
If concept P is more general than concept Q, we say that
“P covers Q”
Color(X,Y) covers color(ball,Y), which in turn covers color(ball,red)
Concept space
Defines a space of potential concept definitions
The example concept space representing the
predicate obj(Sizes, Color, Shapes) with properties and values
Sizes = {large, small}
Colors = {red, white, blue}
Shapes = {ball, brick, cube}
is presented in Figure 9.5 (p 362, sp21)
Machine Learning
20
A Concept Space (Fig. 9.5)
KU NLP
Machine Learning
21
9.2.2 The candidate elimination
algorithm
KU NLP
Version space: the set of all concept descriptions
consistent with the training examples.
Toward reducing the size of the version space as
more examples become available (Fig. 9.10)
Specific to general search from positive examples
General to specific search from negative examples
Candidate elimination algorithm combines these into a bi-
directional search
Generalize based on regularities found in the
training data
Supervised learning
Machine Learning
22
9.2.2 The candidate elimination
algorithm
KU NLP
The learned concept must be general enough to cover
all positive examples, also must be specific enough to
exclude all negative examples
maximally specific generalization
A concept c, is maximally specific if it covers all positive examples,
none of the negative examples, and for any concept c’, that covers
the positive examples, c c’
Maximally general specialization
A concept c, is maximally general if it covers none of the negative
training instances, and for any other concept c’, that covers no
negative training instance, c c’.
Machine Learning
23
Specific to General Search
KU NLP
Machine Learning
24
Specific to General Search (Fig 9.7)
KU NLP
Machine Learning
25
General to Specific Search
KU NLP
Machine Learning
26
General to Specific Search (Fig 9.8)
KU NLP
Machine Learning
27
KU NLP
Machine Learning
9.2.2 The candidate elimination
algorithm
28
9.2.2 The candidate elimination
algorithm
KU NLP
Begin
Initialize G to the most general concept in the space;
Initialize S to the first positive training instance;
For each new positive instance p
Begin
Delete all members of G that fail to match p;
For every s in S, if s does not match p, replace s with its most specific generalizations that match p
and are more specific than some members of G;
Delete from S any hypothesis more general than some other hypothesis in S;
End;
For each new negative instance n
Begin
Delete all members of S that match n;
For each g in G that matches n, replace g with its most general specializations that do not match n
and are more general than some members of S;
Delete from G any hypothesis more specific than some other hypothesis in G;
End
Machine Learning
29
KU NLP
Machine Learning
9.2.2 The candidate elimination
algorithm (Fig. 9.9)
30
9.2.2 The candidate elimination
algorithm
KU NLP
Combining the two directions of search into a
single algorithm has several benefits.
G and S sets summarizes the information in the negative
and positive training instances.
Fig. 9.10 gives an abstract description of the
candidate elimination algorithm.
“+” signs represent positive instances
“-” signs indicate negative instances
The search “shrinks” the outermost concept to exclude
negative instances
The search “expands” the innermost concept to include new
positive instances
Machine Learning
31
KU NLP
Machine Learning
9.2.2 The candidate elimination
algorithm
32
9.2.2 The candidate elimination
algorithm
KU NLP
An incremental nature of learning algorithm
Accepts training instances one at a time, forming a usable,
although possibly incomplete, generalization after each
example (unlike the batch algorithm such as ID3).
Even before the algorithm converges on a single
concept, the G and S sets provide usable
constraints on that concept
If c is the goal concept, then for all g∈G and s∈S, s≤c≤g.
Any concept that is more general than some concept in G
will cover negative instance; any concept that is more
specific than some concept in S will fail to cover some
positive instances
Machine Learning
33
9.2.4 Evaluating Candidate
Elimination
KU NLP
Problems
combinatorics of problem space: excessive growth of search
space
Useful to develop heuristics for pruning states from G and S
(beam search)
Uses an inductive bias to reduce the size of concept space
trade off between expressiveness and efficiency
The algorithm may fail to converge because of noise or
inconsistency in training data
One solution to this problem is to maintain multiple G and S sets
Contribution
explication of the relationship between knowledge representation,
generalization, and search in inductive learning
Machine Learning
34