Inductive Learning

Download Report

Transcript Inductive Learning

Advanced Computing Seminar
Data Mining and Its Industrial
Applications
— Chapter 4 —
Inductive Learning
Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr
Knowledge and Software Engineering Lab
Advanced Computing Research Centre
School of Computer and Information Science
University of South Australia
)
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
1
Outline

Introduction

Machine learning

Version space and bias

Decision tree learning

Ripper algorithm

Summary
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
2
Basic Concepts


Data: Store on any media with certain
format
Information: Assign meaning to
concrete data

2015/7/16
knowledge: Refine from information
Chap4 Inductive Learning Zhongzhi Shi
3
Why Data Mining?
Data






Finance
Economic
Government
Post
Population
Life cycle
Knowledge







Pattern
Trends
Concept
Relation
Model
Association
Rules
Sequence
Decision
Making





E-commerce
Resource
distribution
Trade
Business
Intelligence
E-Science
Rich Data, Poor Knowledge
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
4
Data Mining vs Knowledge Discovery

Data mining

Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
patterns or knowledge from huge amount of
data

Knowledge discovery (mining) in databases
(KDD), knowledge extraction, data/pattern
analysis, data archeology, data dredging,
information harvesting, business intelligence,
etc.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
5
Data Mining: A KDD Process

Data mining—core of
knowledge discovery
process
Pattern Evaluation
Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
Data Integration
Databases
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
6
Data Warehouse Process
Organization
Readiness
Assessment
Business
Strategy
Def inition
Data
Warehouse
Architecture
Def inition
• Meta data management
• Data access
• Systems Integration
Data
Warehouse
Infrastructure
Design
Design and
Build
Data
Exploitation
Implementation
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
7
Macro Picture
Mapping Rules
Designed Star Schema
Data Mining Approach to
Data Warehouse Design
Desired star
schema
Attribute
Numeric
Text fields
• Width
• Type
• NULL allowed
• Name
• Key
• Maximum
• Minimum
• Average
• Standard deviation
• Number of spaces
• Numerals used
• Average length
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
8
Detailed picture
De s igne d Star Sche m a
M apping rule s
Integrator
Desired
Star Schema
Attribute Classifier
Similarity Calculator
Translator
Extractor
Info
Info
Source
Info
Source
1
Source
1
1
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
9
Knowledge Representation

Production system

Frame

Semantic networks

First order logic

Ontology
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
10
Production System

Rules
IF (conditions)
Then (conclusions)
If ( animal has wing) and
(animal can fly)
Then (animal is a bird)
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
11
Production System
MYCIN
$<rule> = IF <antecedent> THEN <action> (ELSE <action>$
$<antecedent> = AND <condition>$
$<condition> = OR <condition> | <predicate> <associativetripe>$
$<associative-tripe> = <attribute> <object> <value>$
$<action> = <consequent>) | <procedure>$
$<consequent> = <associative-triple> <certainty-factor>$
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
12
Frame Structure
FRAME FRAME-NAME
SLOT-NAME-1: ASPECT-11 ASPECT-VALUE-11
ASPECT-12 ASPECT-VALUE-12
......
ASPECT-1m AWPECT-VALUE-1m
......
SOLT-NAME-n: ASPECT-n1 ASPECT VALUE-n1
ASPECT-n2 ASPECT-VAPECT-VALUE-n2
ASPECT-n1
2015/7/16
ASPECT-VALUE-n1
Chap4 Inductive Learning Zhongzhi Shi
13
Semantic Networks
node: objects
arc:
2015/7/16
relationships
Chap4 Inductive Learning Zhongzhi Shi
14
First Order Logic

Student(John)

Teacher(Markus)

Father(x,y)

Father(y,z)

Grandfather(x,z):-Father(x,y),Father(y,z)

If ( animal has wing) and
(animal can fly)
Then (animal is a bird)
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
15
Ontology
Semantic Web:

Ontology

OWL

Ontology schema

Description Logic
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
16
Outline

Introduction

Machine learning

Version space and bias

Decision tree learning

Ripper algorithm

Summary
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
17
The Essence of Learning


Learning denotes changes in the system that are
adaptive in the sense that they enable the
system to do the same task or tasks drawn from
the same population more efficiently and more
effectively the next time. [Simon 1983]
Machine learning is the study of how to make
machines acquire new knowledge, new skills, and
reorganize existing knowledge.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
18
The Essence of Learning
Environment
Learning
Element
Knowledge
Base
Performance
Element
Feedback

The environment supplies the source information
to the learning system. The level and quality of
the information will significantly affect the
learning strategy.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
19
The Essence of Learning

The environment = Information source
 Database
 Text
 Web pages
 Image
 Video
 Space data
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
20
The Essence of Learning

The learning element uses this information to make
improvements in an explicit knowledge base, and the
performance element uses the knowledge base to
perform its task.
 Inductive learning
 Analogical Learning
 Explanation Learning
 Genetic algorithm
 Neural network
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
21
Paradigms for Machine Learning

The inductive paradigm
The most widely studied method for symbolic learning is one of
inducing a general concept description from a sequence of instances
of the concept and known counterexamples of the concept. The task
is to build a concept description from which all the previous positive
instances can be rederived by universal instantiation but none of the
previous negative instances can be rederived by the same process.

The analogical paradigm
Analogical reasoning is a strategy of inference that allows the transfer
of knowledge from a known area into another area with similar
properties.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
22
Paradigms for Machine Learning

The analytic paradigm
The methods attempt to formulate a generalization after analyzing
few instances in terms of the systems's knowledge. Mainly
deductive rather than inductive mechanisms are used for such
learning.

The genetic paradigm
Genetic algorithms have been inspired by a direct analogy to
mutations in biological reproduction and Darwinian natural
selection. In principle, genetic algorithms encode a parallel search
through concept space, with each process attempting coarse-grain
hill climbing.

The connectionist paradigm
Connectionist learning systems, also called ``neural networks“.
Connectionist learning consists of readjusting weights in a fixedtopology network via specific learning algorithms
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
23
The Essence of Learning

The knowledge base contains predefined
concepts, domain constrains heuristic rules and
so on.
 Knowledge representation
 Knowledge consistence
 Knowledge redundancy
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
24
The Essence of Learning

The performance element. The learning element
is trying to improve the action of the performance
element. The performance element applies
knowledge to solve problems and evaluate the
learning effects.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
25
On Concept

The term ``concept" is an universal notion which reflects a general,
abstract, and essential features. For example, ``triangle", ``animal",
``computer", all of them are concept. Horse, tiger, bird and so on are
called as example of the concept ``animal".
Concept contains two meanings, extension and intension.


Intension. The set of attributes which reflect the essential features
of a concept is called intension.
Extension. The set of examples which satisfy the definition
of a concept is called extension.
Fruit
Student
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
26
Concept Description

In general, a concept can be described by the concept name, and
list of the attributes and attribute-value pairs, that is,
(Concept name (Attribute 1 Value1)
(Attribute2
Value2)
…
(Attributen Valuen)
In addition, concept description can be represented by first order logic.
Each attribute is a predicate, concept name and attribute value can be
viewed as arguments. Concept description is represented by
predicate calculus
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
27
Attribute Types

Nominal attribute is one that takes on
a finite, unordered set of mutually
exclusive values.

Linear attribute

Structured attribute
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
28
Attribute Types


•
•
Nominal attribute is one that takes on
a finite, unordered set of mutually
exclusive values.
For examples
Color: red, green, blue
Traffic: airline, railway, ship
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
29
Attribute Types

Linear attribute
For examples
•
Age: 1,2,…100
•
Temperature: 20, 21,…
•
Distance: 1km, 2km,…
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
30
Attribute Types

Structured attribute

computer
For examples:
•
Hardware
Tree structure
•
2015/7/16
CPU
Memory
Computing
Control
Chap4 Inductive Learning Zhongzhi Shi
Software
31
Inductive Learning

From particular examples to general
conclusion, principle, rule
apple
eat
tomato eat
banana eat
…
…
fruit
eat
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
32
Inductive Learning


Given:
• Premise statements. Consists of facts, specific
observations, intermediate generalizations that
provide information about some objects,
phenomena, processes, and so on.
• Tentative inductive assertion. Provides a priori
hypothesis held about the objects in the
premise statement.
• Background knowledge. Contains general and
domain-specific concepts for interpreting the premises and inference
rules relevant to the task of inference
Find:
Inductive assertion (hypothesis). It strongly or weakly implies the
premise statements in the context of background knowledge and
satisfies the preference criterion.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
33
Inductive Learning
• Simplest form: learn a function from examples
•
f is the target function
An example is a pair (x, f(x))
Problem: find a hypothesis h
such that h ≈ f
given a training set of examples
(This is a highly simplified model of real learning:
– Ignores prior knowledge
– Assumes examples are given)
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
34
Inductive Learning Method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
•
• E.g., curve fitting:
•
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
35
Inductive Learning Method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
•
• E.g., curve fitting:
•
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
36
Inductive Learning Method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
•
• E.g., curve fitting:
•
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
37
Inductive Learning Method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
•
• E.g., curve fitting:
•
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
38
Best-Hypothesis



Positive example  generalize
Negative example  specialize
Drawbacks: check previous examples & backtrack
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
39
Outline

Introduction

Machine learning

Version space and bias

Decision tree learning

Ripper algorithm

Summary
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
40
Hypothesis Space


Concept description
Extension
a certain set of examples predicted to be satisfied by the
hypothesis

Bias
any preference for one hypothesis over another
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
41
Training Examples for Enjoy Sport
Sky
Temp Humidity Wind Water Forecast
EnjoySport
Sunny Warm Normal
Strong Warm
Same
YES
Sunny Warm
High
Strong Warm
Same
YES
Rainy
High
Strong Warm Change
NO
High
Strong Cool
YES
Cold
Sunny Warm
Change
What is the general concept?
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
42
is more_general_than_or_equal_to
relation

Definition of more_general_than_or_equal_to relation:
Let hj and hk be boolean-valued functions defined over X.
Then hj is more_general_than_or_equal_to hk (hj g
hk) iff
(xX) [(hk(x)=1)(hj(x)=1)]
In our case the most general hypothesis - that every day
is a positive example - is represented by
?, ?, ?, ?, ?, ?,
and the most specific possible hypothesis - that no day is
positive example - is represented by
 , , , , ,  .
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
43
Example of the Ordering of
Hypotheses
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
44
Version Space Search
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
45
Version Space Example
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
46
Representing Version Space



2015/7/16
The General boundary, G, of version space VSH,E,
is the set of its maximally general members
The Specific boundary, S, of version space VSH,E, is
the set of its maximally specific members
Every member of the version space lies between
these boundaries
VSH,E, = {hH | (sS) (gG) (ghs)}
where xy means x is more general or equal to y
Chap4 Inductive Learning Zhongzhi Shi
47
Candidate-elimination algorithm
1 Initilize H to be the whole space. Thus, the G set contains only the null
description, and the S set is consistent with the first observed positive
training instance.
2. For each subsequent instance, i,
BEGIN
IF i is a positive instance,
THEN BEGIN
Retain in G only those generalizations which match I.
Update S to generalize the elements in S as little as
possible, so that they will match i.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
48
Candidate-elimination algorithm
ELSE IF i is a negative instance,
THEN BEGIN
Retain in S only those generalizations which do not match I.
Update G to specialize the elements in G as little as possible,
so that they will not match i.
3 Repeat step 2 until G = S and this is a singleton set. When this occurs, H has
collapsed to include only a single concept.
4 Output H.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
49
Converging Boundaries of the G and
S sets
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
50
Example Trace
2015/7/16
(1)
Chap4 Inductive Learning Zhongzhi Shi
51
Example Trace
2015/7/16
(2)
Chap4 Inductive Learning Zhongzhi Shi
52
Example Trace
2015/7/16
(3)
Chap4 Inductive Learning Zhongzhi Shi
53
Example Trace
2015/7/16
(4)
Chap4 Inductive Learning Zhongzhi Shi
54
How to Classify new Instances?


New instance i is classified as a positive instance if
every hypothesis in the current version space
classifies it as positive.
 Efficient test - iff the instance satisfies every
member of S
New instance i is classified as a negative instance
if every hypothesis in the current version space
classifies it as negative.
 Efficient test - iff the instance satisfies none of
the members of G
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
55
New Instances to be Classified
A
 Sunny, Warm, Normal, Strong, Cool, Change  (YES)
B
 Rainy, Cold, Normal, Light, Warm, Same 
(NO)
C
 Sunny, Warm, Normal, Light, Warm, Same 
(Ppos(C)=3/6)
D
 Sunny, Cold, Normal, Strong, Warm, Same 
(Ppos(C)=2/6)
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
56
Remarks on Version Space
and Candidate-Elimination



2015/7/16
The algorithm outputs a set of all hypotheses
consistent with the training examples
 iff there are no errors in the training data
 iff there is some hypothesis in H that correctly
describes the target concept
The target concept is exactly learned when the S and
G boundary sets converge to a single identical
hypothesis.
Applications
 learning regularities in chemical mass
spectroscopy
 learning control rules for heuristic search
Chap4 Inductive Learning Zhongzhi Shi
57
Drawbacks of Version Space



Assume consistent training data
Noise-sensitive
Comments on version space
though not practical in most real-world learning problems,
they provide a good deal of insight into the logical structure
of hypothesis space
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
58
Version-Space Merging
G1
G2
G1  2
VS1
VS2
VS1  2
S1  2
S1
2015/7/16
S2
Chap4 Inductive Learning Zhongzhi Shi
59
Version-Space Merging

Conceptional
each new piece of information  new version space

Practical
parallel
ambiguous, inconsistent data, background domain theories
VSI
VSn
2015/7/16
VSM
Chap4 Inductive Learning Zhongzhi Shi
60
IVSM Examples
any-shape
Polyhedron
Large
Small
Octoploid
Pyramid
Cube
2015/7/16
Spheroid
any-size
Chap4 Inductive Learning Zhongzhi Shi
61
IVSM Examples

Example Instance S Instance G
Resulting S
Resulting G
 [S,C]
[S,C]
[?,?]
[S,C]
[?,?]
X [S,Sp]
f
[L,?] [?,Po]
[S,C]
[?,Po]
X [L,O]
f
[S,?] [?,C]
[?,Py] [?,C]
[S,C]
[?,C] [S,Py]
 [S,P]
[S,P]
[?,?]
[S,C]
[S,Py]
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
[S,Po]
62
Bias

Definition
any basis for choosing one generalization over another
any factor that influences the definition or selection of
inductive hypotheses

Representational bias
lauguage, language implementation, primitive terms

Procedural (algorithmic) bias
order of traversal of the states in the space defined by a
representational bias
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
63
Bias
Bias
Program
Hypothesis
Search
Knowledge
Training set
Training Examples
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
64
Bias Selection & Evaluation


Real-world domains have potentially hundreds of
features and sources of data
Why is bias selection important?
improve the predictive accuracy of the learner
improve performance goals


Selection: static vs. dynamic
Evaluation: basis for bias selection
online and empirical vs. offline and analytical
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
65
Multi-Tiered Bias System

Bias shifting
bias selection occurs again after learning has begun
useful when the knowledge for bias selection is not available
prior to learning, but can be gathered during learning

Multi-tierd bias

make embedded biases explicit!
reduce the cost of system and knowledge engineering
flexible system design, conceptual simplicity
Characterize learning as search within multiple tiers!
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
66
Multi-Tiered Bias Search Space
Procedural Meta-Bias Spaces
Representational Meta-Bias Spaces
P(l(L(H)))
Representational
Bias Space
L(L(H))
L(H)
Hypothesis Space
2015/7/16
L(P(l(H)))
P(l(H)))
P(l(P(l(H))))
Procedural
Bias Space
H
Chap4 Inductive Learning Zhongzhi Shi
67
Outline

Introduction

Machine learning

Version space and bias

Decision tree learning

Ripper algorithm

Summary
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
68
Decision Tree Learning
1966
Hunt, Marin, Stone: CLS
1983 Quinlan: ID3
1986
Schlimmer, Fisher: ID4,
Incremental learning
1988 Utgoff: ID5
1993 Quinlan: C4.5, C5
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
69
Play tennis: Training examples
Day
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
D14
2015/7/16
Outlook
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Temperature
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
Humidity
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
Wind
Weak
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Strong
Chap4 Inductive Learning Zhongzhi Shi
Play Tennis
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
70
CLS learning algorithm


Decision tree

each internal node tests an attribute

each branch corresponds to attribute value

each leaf node assigns a classification
Decision trees are inherently disjunctive, since
each branch leaving a decision node corresponds
to a separate disjunctive case. Decision trees can
be used to represent disjunctive concepts.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
71
CLS learning algorithm
The CLS algorithm starts with an empty decision tree
and gradually refines it, by adding decision nodes,
until the tree correctly classifies all the training
instances. The algorithm operates over a set of
training instances, C, as follows:



If all instances in C are positive, then create a YES node and
halt. If all instances in C are negative, create a NO node
and halt. Otherwise, select (using some heuristic criterion)
an attribute, A, with values v1,…,vn and create the decision
tree.
Partition the training instances in C into subsets C1,…,Cn
according to the values of V.
Apply the algorithm recursively to each of the sets Ci.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
72
ID3 Approach

ID3 algorithm

build decision tree based on training objects
with known class labels to classify testing
objects

rank attributes with information gain measure

minimal height

2015/7/16
the least number of tests to classify an object
Chap4 Inductive Learning Zhongzhi Shi
73
Decision Tree Representation


Representation:
 Internal node test on some property (attribute)
 Branch corresponds to attribute value
 Leaf node assigns a classification
Decision trees represent a disjunction of conjunctions of
constraints on the attribute values of instances
(Outlook = Sunny  Humidity = Normal)
 (Outlook = Overcast)
 (Outlook = Rain  Wind = Weak)
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
74
Decision Tree Example
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
75
Appropriate problems for decision
Trees

Instances are represented by attribute-value pairs

Target function has discrete output values

Disjunctive hypothesis may be required

Possibly noisy training data
 data may contain errors
 data may contain missing attribute values
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
76
Learning of Decision Trees
Top-Down Induction of Decision Trees

Algorithm: The ID3 learning algorithm (Quinlan,
1986)
If all examples from E belong to the same class Cj
 then label the leaf with Cj
 else




2015/7/16
select the “best” decision attribute A with values
v1, v2, …, vn for next node
divide the training set S into S1, …, Sn according to
values v1,…,vn
recursively build subtrees T1, …, Tn for S1, …, Sn
generate decision tree T
Chap4 Inductive Learning Zhongzhi Shi
77
Entropy






2015/7/16
S - a sample of training examples;
p+ (p-) is a proportion of positive (negative) examples
in S
Entropy(S) = expected number of bits needed to
encode the classification of an arbitrary member of S
Information theory: optimal length code assigns
-log2 p bits to message having probability p
Expected number of bits to encode “+” or “-” of
random member of S:
Entropy(S)  - p-  log2 p- - p+  log2 p+
Generally for c different classes
Entropy(S)  c- pi  log2 pi
Chap4 Inductive Learning Zhongzhi Shi
78
Entropy


2015/7/16
The entropy function relative to a boolean classification,
as the proportion of positive examples varies between 0
and 1
entropy as a measure of impurity in a collection of
examples
Chap4 Inductive Learning Zhongzhi Shi
79
Information Gain Search Heuristic

Gain(S,A) - the expected reduction in entropy caused
by partitioning the examples of S according to the
attribute A.

a measure of the effectiveness of an attribute in
classifying the training data
Gain( S , A) = Entropy( S ) -

v Values( A)



Sv
S
Entropy( Sv)
Values(A) - possible values of the attribute A
Sv - subset of S, for which attribute A has value v
The best attribute has maximal Gain(S,A)
 Aim is to minimise the number of tests needed for
class.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
80
Play Tennis: Information Gain
Values(Wind) = {Weak, Strong}
 S = [9+, 5-], E(S) = 0.940
 Sweak = [6+, 2-], E(Sweak) = 0.811
 Sstrong = [3+, 3-], E(Sstrong) = 1.0
Gain(S,Wind) = E(S) - (8/14)  E(Sweak) - (6/14) 
E(Sstrong) =
0.940 - (8/14)  0.811 - (6/14)  1.0 = 0.048
Gain(S,Outlook) = 0.246
Gain(S,Humidity) = 0.151
Gain(S,Temperature) = 0.029
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
81
Entropy and Information Gain


S contains si tuples of class Ci for i = {1, …, m}
Information measures info required to classify
any arbitrary tuple
s
s
I( s ,s ,...,s )   log
s
s
m
1
2
m
i
i
2
i 1

Entropy of attribute A with values {a1,a2,…,av}
s1 j  ... smj
I ( s1 j ,...,smj )
s
j 1
v
E(A) 

Information gained by branching on attribute A
Gain(A) I(s1, s 2 ,...,sm)  E(A)
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
82
The ID3 Algorithm


function ID3 (R: a set of non-categorical attributes,
C: the categorical attribute,
S: a training
set) returns a decision tree;
begin
If S is empty, return a single node with value Failure;
If S consists of records all with the same value for
the categorical attribute,
return a single node with that value;
If R is empty, then return a single node with as value
the most frequent of the values of the categorical
attribute
that are found in records of S; [note that then there
will be errors, that is, records that will be
improperly
classified];
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
83
The ID3 Algorithm
Let D be the attribute with largest Gain(D,S)
among
attributes in R;
Let {dj| j=1,2, .., m} be the values of attribute D;
Let {Sj| j=1,2, .., m} be the subsets of S consisting
respectively of records with value dj for attribute
D;
Return a tree with root labeled D and arcs
labeled d1, d2, .., dm going respectively to the
trees
ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-{D},
C, Sm);
end ID3;
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
84
C4.5




c4.5 is a program that creates a decision tree
based on a set of labeled input data.
This decision tree can then be tested against
unseen labeled test data to quantify how well it
generalizes.
The software for C4.5 can be obtained with Quinlan's
book. A wide variety of training and test data is
available, some provided by Quinlan.
Quinlan,J.R is working at RULEQUEST RESEARCH
company, See5/C5.0 has been designed to
operate on large databases and incorporates
innovations such as boosting.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
85
C4.5








C4.5 is a software extension of the basic ID3 algorithm designed
by Quinlan to address the following issues not dealt with by
ID3:
Avoiding overfitting the data

Determining how deeply to grow a decision tree.
Reduced error pruning.
Rule post-pruning.
Handling continuous attributes.

e.g., temperature
Choosing an appropriate attribute selection measure.
Handling training data with missing attribute values.
Handling attributes with differing costs.
Improving computational efficiency.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
86
Running c4.5


On cunix.columbia.edu
 ~amr2104/c4.5/bin/c4.5 –u –f filestem
c4.5 expects to find 3 files
 filestem.names
 filestem.data
 filestem.test
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
87
File Format: .names


The file begins with a comma separated list of
classes ending with a period, followed by a blank
line
 E.g, >50K, <=50K.
The remaining lines have the following format
(note the end of line period):
 Attribute: {ignore, discrete n, continuous, list}.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
88
Example: census.names
>50K, <=50K.
age: continuous.
workclass: Private, Self-emp-not-inc, Self-emp-inc, Federalgov, etc.
fnlwgt: continuous.
education: Bachelors, Some-college, 11th, HS-grad, Profschool, etc.
education-num: continuous.
marital-status: Married-civ-spouse, Divorced, Never-married,
etc.
occupation: Tech-support, Craft-repair, Other-service, Sales,
etc.
relationship: Wife, Own-child, Husband, Not-in-family,
Unmarried.
race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other,
Black.
sex: Female, Male.
capital-gain: continuous.
capital-loss: continuous.
hours-per-week: continuous.
89
2015/7/16
native-country: United-States,
Chap4 Inductive Learning
Cambodia,
Zhongzhi ShiEngland, Puerto-
File Format: .data, .test


Each line in these data files is a comma separated
list of attribute values ending with a class label
followed by a period.
 The attributes must be in the same order as
described in the .names file.
 Unavailable values can be entered as ‘?’
When creating test sets, make sure that you
remove these data points from the training data.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
90
Example: adult.test
25, Private, 226802, 11th, 7, Never-married, Machine-op-inspct, Own-child,
Black, Male, 0, 0, 40, United-States, <=50K.
38, Private, 89814, HS-grad, 9, Married-civ-spouse, Farming-fishing, Husband,
White, Male, 0, 0, 50, United-States, <=50K.
28, Local-gov, 336951, Assoc-acdm, 12, Married-civ-spouse, Protective-serv,
Husband, White, Male, 0, 0, 40, United-States, >50K.
44, Private, 160323, Some-college, 10, Married-civ-spouse, Machine-op-inspct,
Husband, Black, Male, 7688, 0, 40, United-States, >50K.
18, ?, 103497, Some-college, 10, Never-married, ?, Own-child, White, Female,
0, 0, 30, United-States, <=50K.
34, Private, 198693, 10th, 6, Never-married, Other-service, Not-in-family,
White, Male, 0, 0, 30, United-States, <=50K.
29, ?, 227026, HS-grad, 9, Never-married, ?, Unmarried, Black, Male, 0, 0, 40,
United-States, <=50K.
63, Self-emp-not-inc, 104626, Prof-school, 15, Married-civ-spouse, Profspecialty, Husband, White, Male, 3103, 0, 32, United-States, >50K.
24, Private, 369667, Some-college, 10, Never-married, Other-service,
Unmarried, White, Female, 0, 0, 40, United-States, <=50K.
55, Private, 104996, 7th-8th, 4, Married-civ-spouse, Craft-repair, Husband,
White, Male, 0, 0, 10, United-States, <=50K.
65, Private, 184454, HS-grad, 9, Married-civ-spouse, Machine-op-inspct,
Husband, White, Male, 6418, 0, 40, United-States, >50K.
36, Federal-gov, 212465, Bachelors, 13, Married-civ-spouse, Adm-clerical,
Husband, White, Male, 0, 0, 40, United-States, <=50K.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
91
c4.5 Output




The decision tree proper.
 (weighted training examples/weighted training
error)
Tables of training error and testing error
Confusion matrix
You’ll want to pipe the output of c4.5 to a text file
for later viewing.
 E.g., c4.5 –u –f filestem > filestem.results
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
92
Example output
capital-gain > 6849 : >50K (203.0/6.2)
|
capital-gain <= 6849 :
|
|
capital-gain > 6514 : <=50K (7.0/1.3)
|
|
capital-gain <= 6514 :
|
|
|
marital-status = Married-civ-spouse: >50K (18.0/1.3)
|
|
|
marital-status = Divorced: <=50K (2.0/1.0)
|
|
|
marital-status = Never-married: >50K (0.0)
|
|
|
marital-status = Separated: >50K (0.0)
|
|
|
marital-status = Widowed: >50K (0.0)
|
|
|
marital-status = Married-spouse-absent: >50K (0.0)
|
|
|
marital-status = Married-AF-spouse: >50K (0.0)
Tree saved
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
93
Example output
Evaluation on training data (4660 items):
Before Pruning
---------------Size
Errors
1692
366( 7.9%)
After Pruning
--------------------------Size
Errors
Estimate
92
659(14.1%)
(16.0%)
<<
Evaluation on test data (2376 items):
Before Pruning
---------------Size
Errors
1692
421(17.7%)
(a) (b)
---- ---328 251
103 1694
2015/7/16
After Pruning
--------------------------Size
Errors
Estimate
92
354(14.9%)
(16.0%)
<<
<-classified as
(a): class >50K
(b): class <=50K
Chap4 Inductive Learning Zhongzhi Shi
94
k-fold Cross Validation





Start with one large data set.
Using a script, randomly divide this data set into
k sets.
At each iteration, use k-1 sets to train the
decision tree, and the remaining set to test the
model.
Repeat this k times and take the average testing
error.
The avg. error describes how well the learning
algorithm can be applied to the data set.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
95
Outline

Introduction

Machine learning

Version space and bias

Decision tree learning

Ripper algorithm

Summary
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
96
Inductive Learning
Training Examples
data-case 1 : decision i1
data-case 2 : decision i2
:
:
data-case n : decision in
Decision Rules
Inductive
Learning
Unit
pattern 1
pattern 2
:
pattern n
decision j1
decision j2
:
decision jn
Inductive “Learning from Examples.”
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
97
Ripper

Ripper (Repeated Incremental Pruning to Producing
Error Reduction)

Ripper algorithm proposed by Cohen in 1995

Ripper is consisted of two phase: the first is to
determine the initial rule set and the second is postprocess rule optimization
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
98
Ripper

separate-and-conquer rule learning algorithm. First the
training data are divided into a growing set and a pruning set.
Then this algorithm generates a rule set in a greedy fashion, a
rule at a time. While generating a rule Ripper searches the
most valuable rule for the current growing set in rule space
which can be defined in the form of BNF. Immediately after a
rule is extracted on growing set, it is pruned on pruning set.
After pruning, the corresponding examples covered by that
rule in the training set (growing and pruning sets) are deleted.
The remaining training data are re-partitioned after each rule
is learned in order to help stabilize any problems caused by a
“bad-split”. This process is repeated until the terminal
conditions satisfy.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
99
Ripper

procedure Rule_Generating(Pos,Neg)
begin
Ruleset := {}
while Pos ¹ {} do
/* grow and prune a new rule */
split (Pos,Neg) into (GrowPos,GrowNeg) and (PrunePos,PruneNeg)
Rule := GrowRule(GrowPos,GrowNeg)
Rule := PruneRule(Rule,PrunePos,PruneNeg)
if the terminal conditions satisfy then
return Ruleset
else
add Rule to Ruleset
remove examples covered by Rule from (Pos,Neg)
endif
endwhile
return Ruleset

end















2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
100
Ripper

After each rule is added into the rule set, the
total description length, an integer value, of
the rule set is computed. The description
length gives a measure of the complexity and
accuracy of a rule set. The terminal conditions
satisfy when there are no positive examples
left or the description length of the current
rule set is more than the user-specified
threshold.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
101
Ripper


Post-process rule optimization
Ripper uses some post-pruning techniques to
optimize the rule set. This optimization is
processed on the possible remaining positive
examples. Re-optimizing the resultant rule set
is called RIPPER2, and the general case of reoptimizing “k” times is called RIPPERk.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
102
Outline

Introduction

Machine learning

Version space and bias

Decision tree learning

Ripper algorithm

Summary
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
103
Summary




Inductive Learning is an important
approach for data mining
Version space can be used to explain
generalization and specialization
ID 3 and C4.5
Ripper algorithms generate efficient
rules
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
104
References









Zhongzhi Shi. Principles of Machine Learning. International Academic
Publishers, 1992
Jiawei Han and Micheline Kamber. Data Mining: Concepts and
Techniques
Morgsn Kaufmann Publishers, 2000
Zhongzhi Shi. Knowledge Discovery. Tsinghua University Press. 2002
H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and
Data Mining. Kluwer Academic Publishers, 1998.
R. S. Michalski. A theory and methodology of inductive learning. In
Michalski et al., editor, Machine Learning: An Artificial Intelligence
Approach, Vol. 1, Morgan Kaufmann, 1983.
T. M. Mitchell. Version spaces: A candidate elimination approach to
rule learning. IJCAI'97, Cambridge, MA.
Quinlan,J.R.: C4.5: Programs for Machine Learning Morgan Kauffman,
1993
T. M. Mitchell. Machine Learning. McGraw Hill, 1997.
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81-106,
1986.
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
105
www.intsci.ac.cn/shizz/
Questions?!
2015/7/16
Chap4 Inductive Learning Zhongzhi Shi
106