Rule Induction with Extension Matrices

Download Report

Transcript Rule Induction with Extension Matrices

Rule Induction with Extension
Matrices
Dr. Xindong Wu
Yuen F. Helbig
Outline






Extension matrix approach for rule induction
The MFL and MCV optimization problems
The AE1 solution
The HCV solution
Noise handling and discretization in HCV
Comparison of HCV with ID3-like algorithms
including C4.5 and C4.5 rules
Extension Matrix
Terminology
a
Xa
e
e–

v ak
n
p
(rij)axb
A(i,j)









Number of attributes
ath attribute
Vector of positive examples
Vector of negative examples
Value of ath attribute in the kth positive
example
Number of negative examples
Number of positive examples
ijth element of axb matrix
ijth element of matrix A
Extension Matrix Definitions
 A positive example is such an example that
belongs to a known class, say ‘Play’


ek  (v1k
,..., vak
)
(overcast, mild, high, windy) => Play
 All the other examples can be called negative
examples


ek  (v1k
,..., vak
)
(rainy, hot, high, windy) => Don’t Play
Negative Example Matrix
 Negative example matrix is defined as
NEM  (e1 ,..., en )T  (rij )nxa
 rainy hot
 rainy cool

 sunny hot

 sunny mild
high
normal
normal
high
windy 

windy

windy 

windy 
Extension Matrix
The extension matrix (EM) of a positive example
against NEM, is defined as
EMk  (rij )nxa , k  {1,...,p}
k

rij  
NEMij
k
dead-element
when, v+jk  NEMij
when, v+jk  NEMij
Example Extension Matrix
Negative Extension Matrix (NEM)
 rainy hot
 rainy cool

 sunny hot

 sunny mild
high
normal
normal
high
Positive Example
overcast
mild
windy 

windy

windy 

windy 
high windy
Example Extension Matrix
Extension Matrix (EM)
*
*
 rainy hot
 rainy cool normal *


 sunny hot normal *


*
*
*
 sunny
Positive Example
overcast
mild
high windy
Paths in Extension Matrices
A set of ‘n’ non-dead elements that come from ‘i’
different rows is called a path in an extension
matrix
X1 X2 X3 Attributes
 1  


  0 1
 1 0 


Extension matrix
e.g., {X1  1, X2  0, X1  1} and {X1  1, X3  1, X2  0}
are paths in the extension matrix above
Conjunctive Formulas
A path {r1j ,..., rnj } in the EMk of the positive
example k against NEM corresponds
to a
n
conjunctive formula or cover
L
[X ji  riji ],
1
n

i1
Path: {X1  1, X2  0, X1  1}
Formula: X1  1  X2  0  X1  1
Path: {X1  1, X3  1, X2  0}
Formula: X1  1  X3  1  X2  0
Extension Matrix Disjunction
EMD  (rij )nxa
Disjunction Matrix
when, k1  {i1,..., ik } : EMk (i, j)  

rij   k
k 1  EMik (i, j)  NEM(i, j) otherwise
1
2
2


{e
,...,
e
A path {r1j ,..., rnj } in the EMD of
i
i }
1
n
1
k
against NE corresponds to a conjunctive
n
formula
or cover
L
[X ji  riji ],which covers


1
i1

(e
,...,
e
all of
n ) Against NE and vice-versa
EMD Example
Negative Extension Matrix (NEM)
 rainy hot
 rainy cool

 sunny hot

 sunny mild
high
normal
normal
high
windy 

windy

windy 

windy 
EMD Example
Extension Matrix Disjunction (EMD)
*
*
 rainy hot
 rainy cool normal *


 sunny hot normal *


*
*
*
 sunny
Positive Example
overcast
mild
high windy
EMD Example
Extension Matrix Disjunction (EMD)
 rainy hot * *
 rainy cool * *


 sunny hot * *


*
* *
 sunny
Positive Example
overcast
mild
normal calm
EMD Example
Extension Matrix Disjunction (EMD)
*
* *
 *
 *

cool * *


 sunny
*
* *


*
* *
 sunny
Positive Example
rainy
hot
high calm
MFL and MCV (1)
 The minimum formula problem (MFL)
 Generating a conjunctive formula that covers a
positive example or an intersecting group of
positive examples against NEM and has the
minimum number of different conjunctive
selectors
 The minimum cover problem (MCV)
 Seeking a cover that covers all positive examples
in PE against NEM and has the minimum number
of conjunctive formulae with each conjunctive
formula being as short as possible
MFL and MCV (2)
 NP-hard
 Two complete algorithms are designed to solve
them when each attribute domain Di  {i 1,…,a}
satisfies |Di  2|
 O(na2a)
for MFL
 O(n2a4a  pa24a)
for MCV
 When |Di  2|, the domain can be decomposed
into several, each having base 2
AE1 Heuristic
 Starting search from columns with the most nondead elements
 Simplifying redundancy by deductive inference
rules in mathematical logic
Problems with AE1
 Can easily loose optimum solution
1


1


1



 

0 1
0 

0 1
0  
 1
Here, AE1 will select [X2  0], [X1  1] , and [X3  1],
instead of [X1  1] and [X3  1]
 Simplifying redundancy for MFL and MCV itself is
NP-hard
What is HCV ?
 HCV is a extension matrix based rule induction
algorithm which is
 Heuristic
 Attributebased
 Noisetolerant
 Divides the positive examples into intersecting
groups.
 Uses HFL heuristics to find a conjunctive formula
which covers each intersecting group.
 Loworder polynomial time complexity at
induction time
HCV Issues
 The HCV algorithm
 The HFL heuristics
 Speed and efficiency
 Noise handling capabilities
 Dealing with numeric and nominal data
 Accuracy and description compactness
HCV Algorithm (1)
Procedure HCV(EM1 , ..., EMp ; Hcv)
integer n, a, p
matrix EM1(n,a), ..., EMp(n,a), D(p)
set Hcv
S1:
D
D(j) = 1 (j = 1, . . . , p) indicates that
EM j has been put into an intersecting
group.
S2:
Hcv  
initialization
for i = 1 to p, do
if D(i) = 0 then
{ EM  EM i
HCV Algorithm (2)
for j = i+1 to p, do
if D(j) = 0 then
{EM2  EM  EMj
If there exists at least one path in EM2
then { EM  EM2, D(j)  1 }
}
next j
call HFL(EM; Hfl)
HcvHcv  Hfl
}
next i
Return (Hcv)
HFL - Fast Strategy
 absent



 low

 absent
 low

slight
strip

normal 


hale fast dry  peep 
slight strip

normal 

slight
spot fast dry  peep 
medium

fast
normal 
Selector [X5  {normal, dry-peep}] can be a
possible selector, which will cover all 5 rows
HFL - Precedence
1


1


1



 

0 1
0 

0 1
0  
 1
Selector [X1  1] and [X3  1] are two inevitable
selectors in the above extension matrix
HFL - Elimination
1


1


1

1

 1 

0 1 0
0 1 

0 1 
0 1  
  0 
Attribute X2 can be eliminated by X3
HFL - Least Frequency
1


1


1



 1

0 1
0 

0 1
0  
0 1
Attribute X1 can be eliminated and there still
exists a path
HFL Algorithm (1)
Procedure HFL(EM; Hfl)
S0: Hfl  {}
S1: /* the fast strategy */
Try the fast strategy on all these rows which haven't
been covered;
If successful, add a corresponding selector to Hfl
and return(Hfl)
S2: /* the precedence strategy */
Apply the precedence strategy to the uncovered rows;
If some inevitable selectors are found,
add them to Hfl, label all the rows they cover,
and go to S1
HFL Algorithm (2)
S3: /* the elimination strategy */
Apply the elimination strategy to those attributes
that have neither been selected nor eliminated;
If an eliminable selector is found, reset all the elements
in the corresponding column with *, and go to S2.
S4: /* the leastfrequency strategy */
Apply the leastfrequency strategy to those attributes
which have neither been selected nor eliminated,
and find a leastfrequency selector;
Reset all the elements in the corresponding column
with *, and go to S2.
Return(Hfl)
Complexity of HFL




S1 - O(na)
S2 - O(na)
S3 - O(na2)
S4 - O(na)
 Overall - O( a(na  na  na2  na) )  O(na3)
Complexity of HCV
 Worst case time complexity
p
O(  (na 
i1
p
3
(2na

na

na

1)

(na
)  1))

ji1
 O(pna 3  p2na)
 Space requirement  2na
HCV Example
Fever
Cough
high
heavy
medium heavy
low
slight
high
medium
medium slight
absent
slight
high
heavy
low
absent
low
slight
slight
medium
X  Ray
flack
flack
spot
flack
flack
strip
hole
ESR
normal
normal
normal
normal
normal
normal
fast
AUSCULTATION
bubble  like
bubble  like
dry  peep
bubble  like
bubble  like
normal
dry  peep
DISEASE
Pneumonia
Pneumonia
Pneumonia
Pneumonia
Pneumonia
Tuberculosis
Tuberculosis
strip
spot
flack
normal
fast
fast
normal
dry  peep
normal
Tuberculosis
Tuberculosis
Tuberculosis
HCV Example
NEM
absent
 high

 low

absent
 low
slight
heavy
slight
slight
strip
hole
strip
spot
medium
flack
normal
normal 
fast
dry  peep

normal
normal 

fast
dry  peep
fast
normal 
HCV Example
EM1
absent
 *

 low

absent
 low
slight
*
slight
slight
strip
hole
strip
spot
*
fast
*
fast
medium
*
fast
normal 
dry  peep

normal 

dry  peep
normal 
Positive Example 1
high
heavy
flack normal bubble  like
HCV Example
EM2
absent
 high

 low

absent
 low
slight
*
slight
slight
strip
hole
strip
spot
*
fast
*
fast
medium
*
fast
normal 
dry  peep

normal 

dry  peep
normal 
Positive Example 2
medium
heavy
flack normal bubble  like
HCV Example
EM3
absent
 high

 *

absent
 *
*
heavy
*
*
strip
hole
strip
*
*
fast
*
fast
medium
flack
fast
normal 
* 

normal 

* 
normal 
Positive Example 3
low
slight
spot normal dry  peep
HCV Example
EM4
absent
 *

 low

absent
 low
slight strip
heavy hole
slight strip
slight spot
*
*
*
fast
*
fast
fast
normal 
dry  peep

normal 

dry  peep
normal 
Positive Example 4
high
medium
flack normal bubble  like
HCV Example
EM5
absent
 high

 low

absent
 low
*
heavy
*
*
strip
hole
strip
spot
*
fast
*
fast
medium
*
fast
normal 
dry  peep

normal 

dry  peep
normal 
Positive Example 5
high
medium
flack normal bubble  like
HCV Example
EM1  EM2
absent
 *

 low

absent
 low
slight
*
slight
slight
strip
hole
strip
spot
*
fast
*
fast
medium
*
fast
normal 
dry  peep

normal 

dry  peep
normal 
HCV Example
EM1  EM2  EM3
absent
 *

 *

absent
 *
*
*
*
*
strip
hole
strip
*
*
fast
*
fast
medium
*
fast
normal 
* 

normal 

* 
normal 
HCV Example
EM1  EM2  EM3  EM4
absent
 *

 *

absent
 *
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal 
* 

normal 

* 
normal 
HCV Example
EM1  EM2  EM3  EM4  EM5
absent
 *

 *

absent
 *
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal 
* 

normal 

* 
normal 
HCV Example
HFL Step 1: Fast Strategy
absent
 *

 *

absent
 *
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal 
* 

normal 

* 
normal 
HFL Rules = {}
HCV Example
HFL Step 2: Precedence
absent
 *

 *

absent
 *
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal 
* 

normal 

* 
normal 
HFL Rules = {}
HCV Example
HFL Step 3: Elimination
absent
 *

 *

absent
 *
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal 
* 

normal 

* 
normal 
HFL Rules = {}
HCV Example
HFL Step 4: Least-Frequency
absent
 *

 *

absent
 *
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal 
* 

normal 

* 
normal 
HFL Rules = {}
HCV Example
HFL Step 4: Least-Frequency
*
*

*

*
*
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal 
* 

normal 

* 
normal 
HFL Rules = {}
HCV Example
HFL Step 2: Precedence
*
*

*

*
*
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal 
* 

normal 

* 
normal 
HFL Rules = {ESR
 fast }
HCV Example
HFL Step 2: Precedence
*
*

*

*
*
* strip * normal 
*
*
*
* 

* strip * normal 

*
*
*
* 
*
*
*
* 
HFL Rules = {ESR
 fast }
HCV Example
HFL Step 1: Fast Strategy
*
*

*

*
*
* strip * normal 
*
*
*
* 

* strip * normal 

*
*
*
* 
*
*
*
* 
HFL Rules = {ESR  fast ,
Auscultation  normal }
HCV Example
HFL Step 1: Fast Strategy
*
*

*

*
*
* normal 
*
* 

* normal 

*
* 
* * *
* 
*
*
*
*
*
*
*
*
HFL Rules = {ESR  fast ,
Auscultation  normal }
HCV Example
HCV generated rule
C4.5rules generated rule
Example (8)
HCV versus AE1
 The use of disjunctive matrix
 Reasonable solution to MFL and MCV
 Noise handling
 Discretization of attributes
HCV Noise Handling
 Don’t care values are dead elements
 Approximate partitioning
 Stopping criteria
Discretization of Attributes
 Information Gain Heuristic
 Stop splitting criteria
 Stop if the information gain on all cut points is the
same.
 Stop if the number of example to split is less than a
certain number.
 Limit the total number of intervals.
Comparison (1)
Table 1: Number of rules and conditions using Monk 1, 2 and
3 dataset as training set 1, 2 and 3 respectively
Algorithm
ID3
C4.5
C4.5 with grouping
C4.5 Rules
C4.5rules with grouping
NewID
HCV
Training Set 1
rules
conditions
53
216
60
262
9
31
31
101
8
19
21
143
7
16
Training Set 2
rules
conditions
105
498
113
566
55
353
97
374
46
188
59
401
39
168
Training Set 3
rules
conditions
30
98
27
89
20
102
23
65
11
35
18
101
18
62
Comparison (2)
Table 2: Accuracy
Algorithm
ID3
C4.5
C4.5 with grouping
C4.5 Rules
C4.5rules with grouping
NewID
HCV
Test Set 1
83.3%
82.4%
100%
92.4%
100%
93%
100%
Test Set 2
68.3%
69.7%
82.4%
75.7%
81.0%
78%
81.7%
Test Set 3
94.4%
90.3%
93.1%
85.4%
91.4%
89%
90.3%
Comparison (3)
Conclusions
 Rules generated in HCV take the form of
variable-valued logic rules, rather than decision
trees
 HCV generates very compact rules in low-order
polynomial time
 Noise handling and discretization
 Predictive accuracy comparable to the ID3
family of algorithms viz., C4.5, C4.5rules