9.3 The ID3 Decision Tree Induction Algorithm

Download Report

Transcript 9.3 The ID3 Decision Tree Induction Algorithm

KU NLP
9.3 The ID3 Decision Tree Induction
Algorithm
 ID3 induces concepts from examples.
 ID3 represents concepts as decision trees.
 Decision tree: a representation that allows us to determine the
classification of an object by testing its values for certain
properties
 An example problem of estimating an individual’s
credit risk on the basis of credit history, current debt,
collateral, and income
 Table 9.1 lists a sample of individuals with known credit risks.
 The decision tree of Fig. 9.13 represents the classifications in
Table 9.1
Machine Learning
35
KU NLP
Machine Learning
Data from credit history of loan
applications (Table 9.1)
36
KU NLP
Machine Learning
A decision tree for credit risk
assessment (Fig. 9.13)
37
KU NLP
9.3 The ID3 Decision Tree Induction
Algorithm
 In a decision tree,
 Each internal node represents a test on some property such as
credit history or debt
 Each possible value of the property corresponds to a branch of
the tree such as high or low
 Leaf nodes represents classifications such as low or moderate
risk
 An individual of unknown type may be classified by traversing the
decision tree.
 The size of the tree necessary to classify a given set
of examples varies according to the order with which
properties are tested.
 Fig. 9.14 shows a tree simpler than Fig. 9.13 but the tree also
classifies the examples in Table 9.1
Machine Learning
38
A simplified decision tree (Fig. 9.14)
KU NLP
Machine Learning
39
9.3 ID3 Decision Tree Induction
Algorithm
KU NLP
 Choice of the optimal tree
 measure
 the greatest likelihood of correctly classifying unseen data
 assumption of ID3 algorithm
 “the simplest decision tree that covers all the training examples”
is the optimal tree
 rationale for this assumption is time-honored heuristic of
preferring simplicity & avoiding unnecessary assumptions
Occam’s Razor principle
“ It is vain to do with more what can be done with less….
Entities should not be multiplied beyond necessity ”
Machine Learning
40
9.3.1 Top-down Decision Tree
Induction
KU NLP
 ID3 algorithm
 constructs decision tree in a top-down fashion
 selects a property at the current node of the tree
 using the property to partition the set of examples
 recursively construct a subtree for each partition
 Continues until all members of the partition are in the same class
 Because the order of tests is critical, ID3 relies on its criteria for
selecting the test
 For example, ID3 constructs Fig. 9.14 from Table 9.1
 ID3 selects INCOME as the root property => Fig. 9.15
 The partition {1,4,7,11} consists entirely of high-risk and CREDIT
HISTORY further devides the partition into {2,3}, {14], and {12}
=> Fig. 9.16
Machine Learning
41
KU NLP
Machine Learning
Decision Tree Construction
Algorithm
42
KU NLP
A partially constructed decision tree
(Fig. 9.15)
Machine Learning
43
KU NLP
Machine Learning
Another partially constructed
decision tree (Fig. 9.16)
44
9.3.2 Information Theoretic Test
Selection
KU NLP
 Test selection method
 strategy
 using information theory to select the test (property)
 procedure
 measure the information gain
 pick the property providing the greatest information gain
 Information gain from property P
gain ( P)  I (C )  E ( P)
I(C ) : total informatio n content of the tree
E(P) : expected informatio n to complete the tree
n
n
|C |
I (C )    p(Ci ) log 2 ( p(Ci )),
E ( P)   i I (Ci )
i 1
i 1 | C |
C : set of training instances, n : No. of value in property set P
Ci : a subset of C partitione d into {C1 , C 2 ,...C n } by property v alues
Machine Learning
45
9.3.2 Information Theoretic Test
Selection
KU NLP
6
6 3
3 5
5
I ( table 9.1)   log 2  log 2  log 2
14
14 14
14 14
14
 1.531 (bits)
C1:$ 0to15k
partition of income property
 {1,4 ,7 ,11}
C2:$15to$35k  {2 ,3,12 ,14}
C3:over$35k  {5,6,8,9 ,10 ,13}
Machine Learning
46
9.3.2 Information Theoretic Test
Selection
KU NLP
4
4
6
E (income)  I(C1 )  I(C 2 )  I(C 3 )
14
14
14
4
4
6
  0.0  1.0   0.650  0.564bits
14
14
14
gain (income)  I (table 9.1) - E (income)
 1.531 - 0.564  0.967 bits
gain (credit history)  0.266
gain (debt)  0.581
gain (collatera l)  0.756
Because INCOME provides the greatest
information gain, ID3 will select it as the root.
Machine Learning
47
9.5 Knowledge and Learning
KU NLP
 Similarity-based Learning
 generalization is a function of similarities across training examples
 biases are limited to syntactic constraints on the form of learned
knowledge
 Knowledge-based Learning
 the need of prior knowledge
 the most effective learning occurs when the learner already has
considerable knowledge of the domain
 argument for the importance of knowledge
 similarity-based learning techniques rely on relatively large amount of
training data. In contrast, humans can form reliable generalizations
from as few as a single training instance.
 any set of training examples can support an unlimited number of
generalizations, most of which are irrelevant or nonsensical.
Machine Learning
48
9.5.2 Explanation-Based Learning
KU NLP
 EBL
 use an explicitly represented domain theory to construct an
explanations of a training example
 By generalizing from the explanation of the instance, EBL

filter noise
 select relevant aspects of experience, and
 organize training data into a systematic and coherent structure
Machine Learning
49
9.5.2 Explanation-Based Learning
KU NLP
 Given
 A target concept
 general specification of a goal state
 A training example
 an instance of the target
 A domain theory
 a set of rules and facts that are used to explain how the training
example is an instance of the goal concept
 Operationality criteria
 some means of describing the form that concept definitions may take
 Determine
 A new schema that achieves target concept in a general way
Machine Learning
50
9.5.2 Explanation-Based Learning
KU NLP
 Example
 target concept : a rule used to infer whether an object is a cup
 premise(X) -> cup(X)
 domain theory
 liftable(X) ^ holds_liquid(X) -> cup(X)
 part(Z, W) ^ concave(W) ^ points_up(W) -> holds_liquid(Z)
 light(Y) ^ part(Y, handle) -> liftable(Y)
 small(A) -> light(A) . made_of(A, feathers) -> light(A)
 training example : an instance of the goal concept
 cup(obj1) , small(obj1), part(obj1, handle), owns(bob, obj1),
part(obj1, bottom), part(obj1, bowl), points_up(bowl), concave(bowl),
color(obj1, red)
 operationality criteria
 Target concepts must be defined in terms of observable, structural
properties such as part and points_up
Machine Learning
51
9.5.2 Explanation-Based Learning
KU NLP
 Algorithm
 construct an explanation of why the example is indeed an
instance of the training concept (Fig. 9.17)

proof that the target concept logically follows from the example
 eliminates irrelevant concepts and captures relevant concepts to the
goal such as color(obj1, red)
 generalize the explanation to produce a concept definition
 by substituting variables for constants that are part of the training
instance while retaining those constants and constraints that are part
of the domain theory
 EBL defines a new rule whose
 conclusion is the root of the tree
 premise is the conjunction of the leaves
small(X) ^ part(X,handle) ^ part(X,W) ^ concave(W) ^ points_up(W)
-> cup(X)
Machine Learning
52
KU NLP
Proof that an object , X, is a cup (Fig.
9.17)
Machine Learning
53
9.5.2 Explanation-Based Learning
KU NLP
 Benefits of EBL
 select the relevant aspects of the training instance using the
domain theory
 form generalizations relevant to specific goals and that are
guaranteed to be logically consistent with the domain theory
 learning from single instance
 hypothesize unstated relationships between its goals and its
experience by constructing an explanation
Machine Learning
54
9.5.3 EBL and Knowledge-Level
Learning
KU NLP
 Issues in EBL
 Objection
 EBL cannot make the leaner do anything new
 EBL only learn rules within the deductive closure of its existing
theory
 sole function of training instance is to focus the theorem prover on
relevant aspects of the problem domain
 Viewed as a form of speed up learning or knowledge base
reformation
 Responses to this objection
 Takes information implicit in a set of rules and makes it explicit
 E.g.) chess game
 to focus on techniques for refining incomplete theories
 development of heuristics for reasoning with imperfect theories, etc.
 to focus on integrating EBL and SBL.
 EBL refine training data where the theory applies
 SBL further generalize the partially generalized data
Machine Learning
55
9.6 Unsupervised Learning
KU NLP
 Supervised vs Unsupervised learning
 supervised learning

the existence of a teacher, fitness function, some other external
method of classifying training instances
 unsupervised learning

eliminates the teacher

learner form and evaluate concepts on its own
 The best example of unsupervised learning is human
 Propose hypotheses to explain observations
 Evaluate their hypotheses using such criteria as simplicity,
generality, and elegance
 Test hypotheses through experiments of their own design
Machine Learning
56
9.6.2 Conceptual Clustering
KU NLP
 Given
 a collection of unclassified objects
 some means of measuring the similarity of objects
 Goal
 organizing the objects into a classes that meet some standard of
quality, such as maximizing the similarity of objects in a class
 Numeric taxonomy
 The oldest approach to the clustering problem
 Represent a object as a collection of features (vector of n feature
values)
 similarity metric : the euclidean distance between objects
 Build clusters in a bottom-up fashion
Machine Learning
57
9.6.2 Conceptual Clustering
KU NLP
 Agglomerative Clustering Algorithm
 step 1
 examine all pairs of objects
 select the pair with highest degree of similarity
 make the pair a cluster
 step 2

define the features of the cluster as some function of the features of the
component members

replace the component objects with the cluster definition
 step 3

repeat the process on the collection of objects until all objects have
been reduced to a single cluster
 The result of the algorithm is a binary tree whose leaf nodes are
instances and whose internal nodes are clusters of increasing size
Machine Learning
58
Hierarchical Clustering
KU NLP
 Agglomerative approach vs. Divisive approach
Step 0
a
b
c
Step 1
Machine Learning
Agglomerative
(Bottom-Up)
ab
abcde
cde
d
e
Step 4
Step 2 Step 3 Step 4
de
Step 3
Step 2 Step 1 Step 0
Divisive
(Top-Down)
59
Agglomerative Hierarchical
Clustering
KU NLP
 클러스터간의 유사도를 측정하는 방법
 Single-Link
 두 클러스터간의 유사도  두 클러스터에서 서로 가장 가까운
두 데이터의 유사도
 Complete-Link
 두 클러스터간의 유사도  두 클러스터에서 서로 가장 먼 두
데이터의 유사도
 Group-Averaging
 Single-Link와 Complete-Link의 “Compromise”
Machine Learning
60
Agglomerative Hierarchical
Clustering
KU NLP
 Single-Link : For the Good Local Coherence!
sim(C1 , C2 )  max( sim(object in C1 , object in C2 ))
Machine Learning
61
Agglomerative Hierarchical
Clustering
KU NLP
 Complete-Link: For the Good Global Cluster
Quality!
sim(C1 , C2 )  min( sim(object in C1 , object in C2 ))
Machine Learning
62
Agglomerative Hierarchical
Clustering
KU NLP
 Group Averaging
 Not the maximum similarity of two data from each cluster
 Not the minimum similarity of two data from each cluster
 Average value among all the pairs of two data from each
cluster!!
 Efficiency?
 Single-Link

Machine Learning
Group Averaging < Complete-Link
63
KU NLP
K-means Clustering
 K-means 알고리즘
Machine Learning
64
KU NLP
K-means Clustering
 K-means 알고리즘 (cont’d)
Machine Learning
65
K-means Clustering
KU NLP
 K-means 알고리즘
1) 임의로 k 개의 시작점(클러스터)을 구한다
2) 각 데이터들에 대해 k개의 시작점 중 가장 가까운 점에
해당하는 클러스터로 할당한다.
3) 각 시작점에 할당된 데이터를 이용하여 k개의 시작점을 다시
구한다. 만일 시작점에 변화가 없으면 클러스터링을
중지한다.
4) 2)번을 수행한다.
Machine Learning
66
K-means Clustering
KU NLP
 K-means 알고리즘의 특징
 빠르고 구현하기 쉽다.
 k개의 점을 반드시 결정해야 한다.
 “중점”을 구할 수 있는 데이터 형태에만 사용 가능하다.
 부적절한 k 값을 준다면, 엉뚱한 클러스터들이 만들어지거나
클러스터링이 완료되지 않을 수도 있다.
k=4 라면?
Machine Learning
67