chpater 9 Machine Leraning - ai
Download
Report
Transcript chpater 9 Machine Leraning - ai
KU NLP
9.3 The ID3 Decision Tree Induction
Algorithm
ID3 induces concepts from examples.
ID3 represents concepts as decision trees.
Decision tree: a representation that allows us to determine the
classification of an object by testing its values for certain
properties
An example problem of estimating an individual’s
credit risk on the basis of credit history, current debt,
collateral, and income
Table 9.1 lists a sample of individuals with known credit risks.
The decision tree of Fig. 9.13 represents the classifications in
Table 9.1
Machine Learning
35
KU NLP
Machine Learning
Data from credit history of loan
applications (Table 9.1)
36
KU NLP
Machine Learning
A decision tree for credit risk
assessment (Fig. 9.13)
37
KU NLP
9.3 The ID3 Decision Tree Induction
Algorithm
In a decision tree,
Each internal node represents a test on some property such as
credit history or debt
Each possible value of the property corresponds to a branch of
the tree such as high or low
Leaf nodes represents classifications such as low or moderate
risk
An individual of unknown type may be classified by traversing the
decision tree.
The size of the tree necessary to classify a given set
of examples varies according to the order with which
properties are tested.
Fig. 9.14 shows a tree simpler than Fig. 9.13 but the tree also
classifies the examples in Table 9.1
Machine Learning
38
A simplified decision tree (Fig. 9.14)
KU NLP
Machine Learning
39
9.3 ID3 Decision Tree Induction
Algorithm
KU NLP
Choice of the optimal tree
measure
the greatest likelihood of correctly classifying unseen data
assumption of ID3 algorithm
“the simplest decision tree that covers all the training examples”
is the optimal tree
rationale for this assumption is time-honored heuristic of
preferring simplicity & avoiding unnecessary assumptions
Occam’s Razor principle
“ It is vain to do with more what can be done with less….
Entities should not be multiplied beyond necessity ”
Machine Learning
40
9.3.1 Top-down Decision Tree
Induction
KU NLP
ID3 algorithm
constructs decision tree in a top-down fashion
selects a property at the current node of the tree
using the property to partition the set of examples
recursively construct a subtree for each partition
Continues until all members of the partition are in the same class
Because the order of tests is critical, ID3 relies on its criteria for
selecting the test
For example, ID3 constructs Fig. 9.14 from Table 9.1
ID3 selects INCOME as the root property => Fig. 9.15
The partition {1,4,7,11} consists entirely of high-risk and CREDIT
HISTORY further devides the partition into {2,3}, {14], and {12}
=> Fig. 9.16
Machine Learning
41
KU NLP
Machine Learning
Decision Tree Construction
Algorithm
42
KU NLP
A partially constructed decision tree
(Fig. 9.15)
Machine Learning
43
KU NLP
Machine Learning
Another partially constructed
decision tree (Fig. 9.16)
44
9.3.2 Information Theoretic Test
Selection
KU NLP
Test selection method
strategy
using information theory to select the test (property)
procedure
measure the information gain
pick the property providing the greatest information gain
Information gain from property P
gain( P) I (C ) E ( P)
I(C ) : totalinformation contentof the tree
E(P): expectedinformation tocompletethe tree
n
n
|C |
I (C ) p(Ci ) log2 ( p(Ci )),
E ( P) i I (Ci )
i 1
i 1 | C |
C : set of training instances, n : No.of valuein propertyset P
Ci : a subset of C partitioned into{C1 , C2 ,...Cn } by property alues
v
Machine Learning
45
9.3.2 Information Theoretic Test
Selection
KU NLP
6
6 3
3 5
5
I ( table 9.1) log2 log2 log2
14
14 14
14 14
14
1.531(bits)
C1:$ 0to15 k
partition of incomeproperty
{1,4 ,7 ,11}
C2:$15to$35 k {2 ,3,12,14}
C3:over$35 k {5,6 ,8,9 ,10,13}
Machine Learning
46
9.3.2 Information Theoretic Test
Selection
KU NLP
4
4
6
E (income) I(C1 ) I(C2 ) I(C3 )
14
14
14
4
4
6
0.0 1.0 0.650 0.564bits
14
14
14
gain(income) I (t able9.1)- E (income)
1.531 - 0.564 0.967 bits
gain(credit hist ory) 0.266
gain(debt ) 0.581
gain(collateral) 0.756
Because INCOME provides the greatest
information gain, ID3 will select it as the root.
Machine Learning
47
9.5 Knowledge and Learning
KU NLP
Similarity-based Learning
generalization is a function of similarities across training examples
biases are limited to syntactic constraints on the form of learned
knowledge
Knowledge-based Learning
the need of prior knowledge
the most effective learning occurs when the learner already has
considerable knowledge of the domain
argument for the importance of knowledge
similarity-based learning techniques rely on relatively large amount of
training data. In contrast, humans can form reliable generalizations
from as few as a single training instance.
any set of training examples can support an unlimited number of
generalizations, most of which are irrelevant or nonsensical.
Machine Learning
48
9.5.2 Explanation-Based Learning
KU NLP
EBL
use an explicitly represented domain theory to construct an
explanations of a training example
By generalizing from the explanation of the instance, EBL
filter noise
select relevant aspects of experience, and
organize training data into a systematic and coherent structure
Machine Learning
49
9.5.2 Explanation-Based Learning
KU NLP
Given
A target concept
general specification of a goal state
A training example
an instance of the target
A domain theory
a set of rules and facts that are used to explain how the training
example is an instance of the goal concept
Operationality criteria
some means of describing the form that concept definitions may take
Determine
A new schema that achieves target concept in a general way
Machine Learning
50
9.5.2 Explanation-Based Learning
KU NLP
Example
target concept : a rule used to infer whether an object is a cup
premise(X) -> cup(X)
domain theory
liftable(X) ^ holds_liquid(X) -> cup(X)
part(Z, W) ^ concave(W) ^ points_up(W) -> holds_liquid(Z)
light(Y) ^ part(Y, handle) -> liftable(Y)
small(A) -> light(A) . made_of(A, feathers) -> light(A)
training example : an instance of the goal concept
cup(obj1) , small(obj1), part(obj1, handle), owns(bob, obj1),
part(obj1, bottom), part(obj1, bowl), points_up(bowl), concave(bowl),
color(obj1, red)
operationality criteria
Target concepts must be defined in terms of observable, structural
properties such as part and points_up
Machine Learning
51
9.5.2 Explanation-Based Learning
KU NLP
Algorithm
construct an explanation of why the example is indeed an
instance of the training concept (Fig. 9.17)
proof that the target concept logically follows from the example
eliminates irrelevant concepts and captures relevant concepts to the
goal such as color(obj1, red)
generalize the explanation to produce a concept definition
by substituting variables for constants that are part of the training
instance while retaining those constants and constraints that are part
of the domain theory
EBL defines a new rule whose
conclusion is the root of the tree
premise is the conjunction of the leaves
small(X) ^ part(X,handle) ^ part(X,W) ^ concave(W) ^ points_up(W)
-> cup(X)
Machine Learning
52
KU NLP
Proof that an object , X, is a cup (Fig.
9.17)
Machine Learning
53
9.5.2 Explanation-Based Learning
KU NLP
Benefits of EBL
select the relevant aspects of the training instance using the
domain theory
form generalizations relevant to specific goals and that are
guaranteed to be logically consistent with the domain theory
learning from single instance
hypothesize unstated relationships between its goals and its
experience by constructing an explanation
Machine Learning
54
9.5.3 EBL and Knowledge-Level
Learning
KU NLP
Issues in EBL
Objection
EBL cannot make the leaner do anything new
EBL only learn rules within the deductive closure of its existing
theory
sole function of training instance is to focus the theorem prover on
relevant aspects of the problem domain
Viewed as a form of speed up learning or knowledge base
reformation
Responses to this objection
Takes information implicit in a set of rules and makes it explicit
E.g.) chess game
to focus on techniques for refining incomplete theories
development of heuristics for reasoning with imperfect theories, etc.
to focus on integrating EBL and SBL.
EBL refine training data where the theory applies
SBL further generalize the partially generalized data
Machine Learning
55
9.6 Unsupervised Learning
KU NLP
Supervised vs Unsupervised learning
supervised learning
the existence of a teacher, fitness function, some other external
method of classifying training instances
unsupervised learning
eliminates the teacher
learner form and evaluate concepts on its own
The best example of unsupervised learning is human
Propose hypotheses to explain observations
Evaluate their hypotheses using such criteria as simplicity,
generality, and elegance
Test hypotheses through experiments of their own design
Machine Learning
56
9.6.2 Conceptual Clustering
KU NLP
Given
a collection of unclassified objects
some means of measuring the similarity of objects
Goal
organizing the objects into a classes that meet some standard of
quality, such as maximizing the similarity of objects in a class
Numeric taxonomy
The oldest approach to the clustering problem
Represent a object as a collection of features (vector of n feature
values)
similarity metric : the euclidean distance between objects
Build clusters in a bottom-up fashion
Machine Learning
57
9.6.2 Conceptual Clustering
KU NLP
Agglomerative Clustering Algorithm
step 1
examine all pairs of objects
select the pair with highest degree of similarity
make the pair a cluster
step 2
define the features of the cluster as some function of the features of the
component members
replace the component objects with the cluster definition
step 3
repeat the process on the collection of objects until all objects have
been reduced to a single cluster
The result of the algorithm is a binary tree whose leaf nodes are
instances and whose internal nodes are clusters of increasing size
Machine Learning
58
Hierarchical Clustering
KU NLP
Agglomerative approach vs. Divisive approach
Step 0
a
b
c
Step 1
Machine Learning
Agglomerative
(Bottom-Up)
ab
abcde
cde
d
e
Step 4
Step 2 Step 3 Step 4
de
Step 3
Step 2 Step 1 Step 0
Divisive
(Top-Down)
59
Agglomerative Hierarchical
Clustering
KU NLP
클러스터간의 유사도를 측정하는 방법
Single-Link
두 클러스터간의 유사도 두 클러스터에서 서로 가장 가까운
두 데이터의 유사도
Complete-Link
두 클러스터간의 유사도 두 클러스터에서 서로 가장 먼 두
데이터의 유사도
Group-Averaging
Single-Link와 Complete-Link의 “Compromise”
Machine Learning
60
Agglomerative Hierarchical
Clustering
KU NLP
Single-Link : For the Good Local Coherence!
sim(C1 , C2 ) max(sim(object in C1 , object in C2 ))
Machine Learning
61
Agglomerative Hierarchical
Clustering
KU NLP
Complete-Link: For the Good Global Cluster
Quality!
sim(C1 , C2 ) min(sim(object in C1 , object in C2 ))
Machine Learning
62
Agglomerative Hierarchical
Clustering
KU NLP
Group Averaging
Not the maximum similarity of two data from each cluster
Not the minimum similarity of two data from each cluster
Average value among all the pairs of two data from each
cluster!!
Efficiency?
Single-Link
Machine Learning
Group Averaging < Complete-Link
63
KU NLP
K-means Clustering
K-means 알고리즘
Machine Learning
64
KU NLP
K-means Clustering
K-means 알고리즘 (cont’d)
Machine Learning
65
K-means Clustering
KU NLP
K-means 알고리즘
1) 임의로 k 개의 시작점(클러스터)을 구한다
2) 각 데이터들에 대해 k개의 시작점 중 가장 가까운 점에
해당하는 클러스터로 할당한다.
3) 각 시작점에 할당된 데이터를 이용하여 k개의 시작점을 다시
구한다. 만일 시작점에 변화가 없으면 클러스터링을
중지한다.
4) 2)번을 수행한다.
Machine Learning
66
K-means Clustering
KU NLP
K-means 알고리즘의 특징
빠르고 구현하기 쉽다.
k개의 점을 반드시 결정해야 한다.
“중점”을 구할 수 있는 데이터 형태에만 사용 가능하다.
부적절한 k 값을 준다면, 엉뚱한 클러스터들이 만들어지거나
클러스터링이 완료되지 않을 수도 있다.
k=4 라면?
Machine Learning
67