Rule Induction with Extension Matrices
Download
Report
Transcript Rule Induction with Extension Matrices
Rule Induction with Extension
Matrices
Dr. Xindong Wu
Yuen F. Helbig
Outline
Extension matrix approach for rule induction
The MFL and MCV optimization problems
The AE1 solution
The HCV solution
Noise handling and discretization in HCV
Comparison of HCV with ID3-like algorithms
including C4.5 and C4.5 rules
Extension Matrix
Terminology
a
Xa
e
e–
v ak
n
p
(rij)axb
A(i,j)
Number of attributes
ath attribute
Vector of positive examples
Vector of negative examples
Value of ath attribute in the kth positive
example
Number of negative examples
Number of positive examples
ijth element of axb matrix
ijth element of matrix A
Extension Matrix Definitions
A positive example is such an example that
belongs to a known class, say ‘Play’
ek (v1k
,..., vak
)
(overcast, mild, high, windy) => Play
All the other examples can be called negative
examples
ek (v1k
,..., vak
)
(rainy, hot, high, windy) => Don’t Play
Negative Example Matrix
Negative example matrix is defined as
NEM (e1 ,..., en )T (rij )nxa
rainy hot
rainy cool
sunny hot
sunny mild
high
normal
normal
high
windy
windy
windy
windy
Extension Matrix
The extension matrix (EM) of a positive example
against NEM, is defined as
EMk (rij )nxa , k {1,...,p}
k
rij
NEMij
k
dead-element
when, v+jk NEMij
when, v+jk NEMij
Example Extension Matrix
Negative Extension Matrix (NEM)
rainy hot
rainy cool
sunny hot
sunny mild
high
normal
normal
high
Positive Example
overcast
mild
windy
windy
windy
windy
high windy
Example Extension Matrix
Extension Matrix (EM)
*
*
rainy hot
rainy cool normal *
sunny hot normal *
*
*
*
sunny
Positive Example
overcast
mild
high windy
Paths in Extension Matrices
A set of ‘n’ non-dead elements that come from ‘i’
different rows is called a path in an extension
matrix
X1 X2 X3 Attributes
1
0 1
1 0
Extension matrix
e.g., {X1 1, X2 0, X1 1} and {X1 1, X3 1, X2 0}
are paths in the extension matrix above
Conjunctive Formulas
A path {r1j ,..., rnj } in the EMk of the positive
example k against NEM corresponds
to a
n
conjunctive formula or cover
L
[X ji riji ],
1
n
i1
Path: {X1 1, X2 0, X1 1}
Formula: X1 1 X2 0 X1 1
Path: {X1 1, X3 1, X2 0}
Formula: X1 1 X3 1 X2 0
Extension Matrix Disjunction
EMD (rij )nxa
Disjunction Matrix
when, k1 {i1,..., ik } : EMk (i, j)
rij k
k 1 EMik (i, j) NEM(i, j) otherwise
1
2
2
{e
,...,
e
A path {r1j ,..., rnj } in the EMD of
i
i }
1
n
1
k
against NE corresponds to a conjunctive
n
formula
or cover
L
[X ji riji ],which covers
1
i1
(e
,...,
e
all of
n ) Against NE and vice-versa
EMD Example
Negative Extension Matrix (NEM)
rainy hot
rainy cool
sunny hot
sunny mild
high
normal
normal
high
windy
windy
windy
windy
EMD Example
Extension Matrix Disjunction (EMD)
*
*
rainy hot
rainy cool normal *
sunny hot normal *
*
*
*
sunny
Positive Example
overcast
mild
high windy
EMD Example
Extension Matrix Disjunction (EMD)
rainy hot * *
rainy cool * *
sunny hot * *
*
* *
sunny
Positive Example
overcast
mild
normal calm
EMD Example
Extension Matrix Disjunction (EMD)
*
* *
*
*
cool * *
sunny
*
* *
*
* *
sunny
Positive Example
rainy
hot
high calm
MFL and MCV (1)
The minimum formula problem (MFL)
Generating a conjunctive formula that covers a
positive example or an intersecting group of
positive examples against NEM and has the
minimum number of different conjunctive
selectors
The minimum cover problem (MCV)
Seeking a cover that covers all positive examples
in PE against NEM and has the minimum number
of conjunctive formulae with each conjunctive
formula being as short as possible
MFL and MCV (2)
NP-hard
Two complete algorithms are designed to solve
them when each attribute domain Di {i 1,…,a}
satisfies |Di 2|
O(na2a)
for MFL
O(n2a4a pa24a)
for MCV
When |Di 2|, the domain can be decomposed
into several, each having base 2
AE1 Heuristic
Starting search from columns with the most nondead elements
Simplifying redundancy by deductive inference
rules in mathematical logic
Problems with AE1
Can easily loose optimum solution
1
1
1
0 1
0
0 1
0
1
Here, AE1 will select [X2 0], [X1 1] , and [X3 1],
instead of [X1 1] and [X3 1]
Simplifying redundancy for MFL and MCV itself is
NP-hard
What is HCV ?
HCV is a extension matrix based rule induction
algorithm which is
Heuristic
Attributebased
Noisetolerant
Divides the positive examples into intersecting
groups.
Uses HFL heuristics to find a conjunctive formula
which covers each intersecting group.
Loworder polynomial time complexity at
induction time
HCV Issues
The HCV algorithm
The HFL heuristics
Speed and efficiency
Noise handling capabilities
Dealing with numeric and nominal data
Accuracy and description compactness
HCV Algorithm (1)
Procedure HCV(EM1 , ..., EMp ; Hcv)
integer n, a, p
matrix EM1(n,a), ..., EMp(n,a), D(p)
set Hcv
S1:
D
D(j) = 1 (j = 1, . . . , p) indicates that
EM j has been put into an intersecting
group.
S2:
Hcv
initialization
for i = 1 to p, do
if D(i) = 0 then
{ EM EM i
HCV Algorithm (2)
for j = i+1 to p, do
if D(j) = 0 then
{EM2 EM EMj
If there exists at least one path in EM2
then { EM EM2, D(j) 1 }
}
next j
call HFL(EM; Hfl)
HcvHcv Hfl
}
next i
Return (Hcv)
HFL - Fast Strategy
absent
low
absent
low
slight
strip
normal
hale fast dry peep
slight strip
normal
slight
spot fast dry peep
medium
fast
normal
Selector [X5 {normal, dry-peep}] can be a
possible selector, which will cover all 5 rows
HFL - Precedence
1
1
1
0 1
0
0 1
0
1
Selector [X1 1] and [X3 1] are two inevitable
selectors in the above extension matrix
HFL - Elimination
1
1
1
1
1
0 1 0
0 1
0 1
0 1
0
Attribute X2 can be eliminated by X3
HFL - Least Frequency
1
1
1
1
0 1
0
0 1
0
0 1
Attribute X1 can be eliminated and there still
exists a path
HFL Algorithm (1)
Procedure HFL(EM; Hfl)
S0: Hfl {}
S1: /* the fast strategy */
Try the fast strategy on all these rows which haven't
been covered;
If successful, add a corresponding selector to Hfl
and return(Hfl)
S2: /* the precedence strategy */
Apply the precedence strategy to the uncovered rows;
If some inevitable selectors are found,
add them to Hfl, label all the rows they cover,
and go to S1
HFL Algorithm (2)
S3: /* the elimination strategy */
Apply the elimination strategy to those attributes
that have neither been selected nor eliminated;
If an eliminable selector is found, reset all the elements
in the corresponding column with *, and go to S2.
S4: /* the leastfrequency strategy */
Apply the leastfrequency strategy to those attributes
which have neither been selected nor eliminated,
and find a leastfrequency selector;
Reset all the elements in the corresponding column
with *, and go to S2.
Return(Hfl)
Complexity of HFL
S1 - O(na)
S2 - O(na)
S3 - O(na2)
S4 - O(na)
Overall - O( a(na na na2 na) ) O(na3)
Complexity of HCV
Worst case time complexity
p
O( (na
i1
p
3
(2na
na
na
1)
(na
) 1))
ji1
O(pna 3 p2na)
Space requirement 2na
HCV Example
Fever
Cough
high
heavy
medium heavy
low
slight
high
medium
medium slight
absent
slight
high
heavy
low
absent
low
slight
slight
medium
X Ray
flack
flack
spot
flack
flack
strip
hole
ESR
normal
normal
normal
normal
normal
normal
fast
AUSCULTATION
bubble like
bubble like
dry peep
bubble like
bubble like
normal
dry peep
DISEASE
Pneumonia
Pneumonia
Pneumonia
Pneumonia
Pneumonia
Tuberculosis
Tuberculosis
strip
spot
flack
normal
fast
fast
normal
dry peep
normal
Tuberculosis
Tuberculosis
Tuberculosis
HCV Example
NEM
absent
high
low
absent
low
slight
heavy
slight
slight
strip
hole
strip
spot
medium
flack
normal
normal
fast
dry peep
normal
normal
fast
dry peep
fast
normal
HCV Example
EM1
absent
*
low
absent
low
slight
*
slight
slight
strip
hole
strip
spot
*
fast
*
fast
medium
*
fast
normal
dry peep
normal
dry peep
normal
Positive Example 1
high
heavy
flack normal bubble like
HCV Example
EM2
absent
high
low
absent
low
slight
*
slight
slight
strip
hole
strip
spot
*
fast
*
fast
medium
*
fast
normal
dry peep
normal
dry peep
normal
Positive Example 2
medium
heavy
flack normal bubble like
HCV Example
EM3
absent
high
*
absent
*
*
heavy
*
*
strip
hole
strip
*
*
fast
*
fast
medium
flack
fast
normal
*
normal
*
normal
Positive Example 3
low
slight
spot normal dry peep
HCV Example
EM4
absent
*
low
absent
low
slight strip
heavy hole
slight strip
slight spot
*
*
*
fast
*
fast
fast
normal
dry peep
normal
dry peep
normal
Positive Example 4
high
medium
flack normal bubble like
HCV Example
EM5
absent
high
low
absent
low
*
heavy
*
*
strip
hole
strip
spot
*
fast
*
fast
medium
*
fast
normal
dry peep
normal
dry peep
normal
Positive Example 5
high
medium
flack normal bubble like
HCV Example
EM1 EM2
absent
*
low
absent
low
slight
*
slight
slight
strip
hole
strip
spot
*
fast
*
fast
medium
*
fast
normal
dry peep
normal
dry peep
normal
HCV Example
EM1 EM2 EM3
absent
*
*
absent
*
*
*
*
*
strip
hole
strip
*
*
fast
*
fast
medium
*
fast
normal
*
normal
*
normal
HCV Example
EM1 EM2 EM3 EM4
absent
*
*
absent
*
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal
*
normal
*
normal
HCV Example
EM1 EM2 EM3 EM4 EM5
absent
*
*
absent
*
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal
*
normal
*
normal
HCV Example
HFL Step 1: Fast Strategy
absent
*
*
absent
*
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal
*
normal
*
normal
HFL Rules = {}
HCV Example
HFL Step 2: Precedence
absent
*
*
absent
*
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal
*
normal
*
normal
HFL Rules = {}
HCV Example
HFL Step 3: Elimination
absent
*
*
absent
*
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal
*
normal
*
normal
HFL Rules = {}
HCV Example
HFL Step 4: Least-Frequency
absent
*
*
absent
*
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal
*
normal
*
normal
HFL Rules = {}
HCV Example
HFL Step 4: Least-Frequency
*
*
*
*
*
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal
*
normal
*
normal
HFL Rules = {}
HCV Example
HFL Step 2: Precedence
*
*
*
*
*
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal
*
normal
*
normal
HFL Rules = {ESR
fast }
HCV Example
HFL Step 2: Precedence
*
*
*
*
*
* strip * normal
*
*
*
*
* strip * normal
*
*
*
*
*
*
*
*
HFL Rules = {ESR
fast }
HCV Example
HFL Step 1: Fast Strategy
*
*
*
*
*
* strip * normal
*
*
*
*
* strip * normal
*
*
*
*
*
*
*
*
HFL Rules = {ESR fast ,
Auscultation normal }
HCV Example
HFL Step 1: Fast Strategy
*
*
*
*
*
* normal
*
*
* normal
*
*
* * *
*
*
*
*
*
*
*
*
*
HFL Rules = {ESR fast ,
Auscultation normal }
HCV Example
HCV generated rule
C4.5rules generated rule
Example (8)
HCV versus AE1
The use of disjunctive matrix
Reasonable solution to MFL and MCV
Noise handling
Discretization of attributes
HCV Noise Handling
Don’t care values are dead elements
Approximate partitioning
Stopping criteria
Discretization of Attributes
Information Gain Heuristic
Stop splitting criteria
Stop if the information gain on all cut points is the
same.
Stop if the number of example to split is less than a
certain number.
Limit the total number of intervals.
Comparison (1)
Table 1: Number of rules and conditions using Monk 1, 2 and
3 dataset as training set 1, 2 and 3 respectively
Algorithm
ID3
C4.5
C4.5 with grouping
C4.5 Rules
C4.5rules with grouping
NewID
HCV
Training Set 1
rules
conditions
53
216
60
262
9
31
31
101
8
19
21
143
7
16
Training Set 2
rules
conditions
105
498
113
566
55
353
97
374
46
188
59
401
39
168
Training Set 3
rules
conditions
30
98
27
89
20
102
23
65
11
35
18
101
18
62
Comparison (2)
Table 2: Accuracy
Algorithm
ID3
C4.5
C4.5 with grouping
C4.5 Rules
C4.5rules with grouping
NewID
HCV
Test Set 1
83.3%
82.4%
100%
92.4%
100%
93%
100%
Test Set 2
68.3%
69.7%
82.4%
75.7%
81.0%
78%
81.7%
Test Set 3
94.4%
90.3%
93.1%
85.4%
91.4%
89%
90.3%
Comparison (3)
Conclusions
Rules generated in HCV take the form of
variable-valued logic rules, rather than decision
trees
HCV generates very compact rules in low-order
polynomial time
Noise handling and discretization
Predictive accuracy comparable to the ID3
family of algorithms viz., C4.5, C4.5rules